Design of a rating system

So I went for developer job at some company who sell gourmet food boxes based on recipes, I was turned down for a job after doing a tech test, I asked why I failed to improve on next time.

They said my design for the rating system was subpar. (they basically wanted a system for rating recipies). Some basic context I had two entities ‘recipe’ ‘rating’. These entities had a one-to-one bidirectional relation. Recipe has rating. The rating entity had these attributes ‘numberVotes’, ‘totalPoints’, ‘average’

in my system you would do a POST request to

/recipe/rate/{recipieID}

With a json that looks like

{"rating":4}

my rating function would then lookup the rating for the recipe using {recipieID} if there wasn’t one would create one, subsequently ratings would then return the rating for the given recipe +1 to ‘numberVotes’ attribute, add the posted rating to the ‘totalPoints’, then calculate the average and update the ‘average’ attribute.

The lead dev said this

but you would be unable to retrieve the resource of a single rating.
Also you are not updating a rating with a particular recipe but a
recipe with a particular rating. So POST:/rate/recipe/{id} should be
PUT:/recipe/{id}/rating

In my mind this is horrible design, he’s suggesting you would want to store each individual rating submitted by a user. If you had 10million users rating an item, are you really wanting to be storing 10million pieces of junk in your database, just you so you can find out how gurtrude rated your key lime pie out of 5?

Then when you want to get the actual rating of the key lime pie, are you going to want to retrieve all 10million results and avg that? Im sure there is a use case of storing individual user ratings for user analysis, but i’m sure you would still want to store the avg rating for quick retrieval rather than doing resource intensive look ups every time you want the rating to a recipe. But in the context of the tech test it asked ‘store a rating for recipes’

What are your thoughts?

3

If you want to keep one user from rating one recipie multiple times, you need to save which user rated which recipie.
Let us continue thinking: If you want to allow a user to remove the rating or change it, you need to store the user that rated a recipie and the rating.

Additionally, by saving those ratings you can do interesting analytics (Recipie recommendation based on previously well rated recipies, etc.).

You could store the current average rating of each recipie and update it on a new rating.

Your thinking was too narrow and you attempted premature optimization that would have reduced the capabilities of the final system. 10 million means nothing, we are living in the age of big data!

4

I think the only mistake you did was not to ask for one simple question before you start designing anything:

What are the requirements of the system?

As you see all the answers have a series of ifs (if you want to keep track of this, or the other thing, etc…). Ok, there are requirements that you can infer as they are common but there other that you can’t unless the “stakeholders” let you know what their system need to show.

3

There are several shortcomings of your solution:

  • Based on the choice of your entities, the averageRating is computed as totalPoints/numRatings. If you store the average in the database, you are introducing redundancy in the database (which might cause inconsistent data in your database)
  • If you don’t keep track of the distinct user votes, there is no way to prevent users/bots from casting multiple votes, thus distoring the overall rating of a recipe. If you require users to specify e.g. their email address to cast a vote, and define email,recipieId as primary key in your database you prevent multiple votes.
  • You mentioned that storing the distinct votes might cause a performance issue. While this might be true, storing the average in the database is premature optimization. If the average calculation is becoming a bottleneck, you can easily implement a caching layer to resolve the issue.
  • I think that the rating functionality of Google Shopping requires distinct ratings to be present in order to be displayed.

If you aggregate data you immediately lose raw data which is a big no-no when it comes data mining.

As well as the cost of aggregating the data with each rating, you will have lost information about the time so you’d have no idea about the density of rating, the first rating, the last rating, the popularity etc.

Storing the maximum amount of information affords the most flexibility. If you start to second guess what the end users of the data want and cut corners, you severely limit the functionality of the end system.

If it turns out the data was required in a rolled up format it would then be very easy to add a view/schema/reporting DB to support this.

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *