Caching data vs. caching output

  softwareengineering

I don’t have much experience in caching among web applications, but I am trying to educate myself in the field and try out various techniques and approaches in ASP.NET MVC.

The majority of tutorials and articles I see online are mentioning output cache ([OutputCache(VaryByParam="...")]), various donut caching methods etc.

Only a small part of articles I see online discuss methods of caching queries on top of data/business layer, which can be implemented with the aid of System.Runtime.Caching or some other key-value store.

I’ve always thought that the usual bottlenecks in web applications are expensive or often-executed queries, not the part that does the rendering into HTML. At least from my own experience on projects I worked on, it has always been this way.

Subjectively, output caching also feels as a “dirtier” method than caching at the data/business layer. Why is it promoted so much? Is it only because it’s easier to implement? Or am I missing some points?

Does ouput caching actually provide benefits over caching at data/business layer? In which kinds of web applications should I focus on which? How to find the balance between those two?

Caching techniques are generally considered to be a difficult topic in web develpment and it’d be great to have a bit more insight into output caching vs. data caching approaches, their differences, strenghts and weaknesses.

1

Output caching has big benefits in applications that need to perform complex queries or calculations before displaying data, or that spend a significant part of their time rendering (turning the data into HTML). This is not just about performing a complicated query but also about performing calculations on data. And for high-volume websites even the simple act of rendering HTML for a list of products from the database can take up a huge amount of resources.

By using an output cache there is no need to query the database, return the result, parse those results and turn them into HTML to be displayed. A request can hit a webserver and not even have to go through to an application server.

Output caching in general is a lot easier to do: you look at the output generated by your application and decide that this output does not need to be generated dynamically at every request, so you cache it.

Application data caching is a lot more complicated because you have to decide at a very granular level if you want to cache something. It also has big consequences on your codebase because the caching code is intermixed with your application code and you HAVE to worry about things like cache invalidation, whereas output caching sits ‘on top’ of your application and you do not even have to write your application with this kind of caching in mind.

It is hard to say when you want to use output caching and when you want to use application data caching. The line between the two also becomes blurred at a certain point. If you cache the result of an expensive calculation but combine it with data from a real-time query because you want real-time data for that part, is the result of that calculation ‘application data’ cached or ‘ouput’ cached?

4

LEAVE A COMMENT