I have been developing a blogging site, where people can post an article or blog and others can comments on them. The site is coded in Angular.js, ASP.NET and SQL is being used a data store.
For each article there can be ‘n’ number of comments and anyone who is registered in the site can put any number of comments in for an article. Each comment can have any number of replies, and the replies can also have replies (so it’s like a tree). So each comment will be like a tree and there shouldn’t be any restrictions on the number of levels. And each article in turn will have a collection of such comment-trees.
I am expecting the application to be extremely read-heavy and moderately write-heavy. What can be a good data store for storing such data?
You’re prematurely optimizing.
How much messages you’ll display on a page? One hundred? One thousand? One billion?
If the number of messages is too much for Microsoft SQL Server, how would you expect any browser to show that many messages to the user, and how would you expect any user to be actually interested in seeing hundreds of thousands¹ of messages on a single page? What’s the point?
When it comes to storing hierarchies, Microsoft SQL Server has hierarchical data feature. SQL Server also allows you to store XML, but I wouldn’t advise you to use it in this specific case: SQL Server’s XML fields are more for hierarchical data which doesn’t change too much through time.
By the way, talking about hierarchical structures, you’re not doing a service to your users. Tree structures are for developers; they are an extremely poor data visualization for things such as discussions, and should be avoided in this context. Flat structure of StackExchange, by opposition to old tree-oriented bulletin boards, is a good illustration of replacing an unusable approach by a very effective one.
¹ Actually, I imagine that Microsoft SQL Server would laugh at you if you tell him that you’re scared to load hundreds of thousands of messages. If your table has proper indexes and foreign keys, the query will virtually take milliseconds.
What about storing your comments in a table like the one below.
| parent | top parent/aritcle | comment | who | when | etc |
Having top parent column should allow you to get all relevant posts in operation. Then you can organise them in a tree, if needed.
As @arseni-mourzenko said – you don’t need to optimise it too early. Good enough is often good enough.