I have been studying software architecture and design patterns for the past weeks, and for a week now I can’t stop thinking about the performance problems that come with the flexibility of the data mapper pattern. In the book written by Martin Fowler, he gives an example of the pattern with some find methods and just one delete method, one update method and one insert method. So, when he has to update multiple objects, he takes multiple “trips” to the database. I started reading about the Unit Of Work pattern in the same book, but it looks like by its implementation that it doesn’t solve this problem either.
Is there a solution that solves this issues without losing too much flexibility?
Eugene Philipov ran a benchmark on multiple
INSERTs in one query and found out that they are really faster than running many sequential inserts after each other. This really did not come as a surprise because
INSERT is a very simple operation.
For updates the reason why you’d be taking multiple trips to a database is because the code is simply easier to understand by the programmer. Unless deemed completely necessary the performance drop will not be substantional.
The situation of updates behaving differently from inserts is caused by the
CASE/WHEN clause that must be present when you want to update attributes of multiple rows with different values. Because of that, the database not only performs search and update but also needs to decide which value should be actually used as a replacement for the current one.
You as a programmer always have the option to write a method which would update an attribute of multiple rows with the same value. On a database level this is very simple to do, you provide a constrait which rows should be affected (most likely using the
IN clause), you then provide the replacement and it’s done. No deciding on the database level what so ever, just a simple search and replace procedure.
I said that updating multiple rows with different values is not as performant as multiple
INSERT, but still, it will be slightly faster than taking multiple trips to the database, because that costs resources (mostly time – to connect to database and to execute each query). But unless you are talking about bulk-updating milions of rows, I don’t think you need to care.
You mentioned Martin Fowler in his example specifically takes multiple trips to the database. In the book he shows an example of the insert action, where each record is processed individually. Even so, Martin Fowler himself says the following:
To avoid multiple database calls, you can leave all your updates to
the end. To do this you need to keep track of all the objects that
have changed. You can use variables in your code for this, but they
soon become unmanageable once you have more than a few. Variables
often work fine with a Transaction Script (110), but they can be very
difficult with a Domain Model (116).
Rather than keep objects in variables you can give each object a dirty
flag that you set when the object changes. Then you need to find all
the dirty objects at the end of your transaction and write them out.
Source: Martin Fowler, P of EAA
The important part from the quote is: you can leave all your updates to the end. Note how he does not say you must do so, because in most cases it’s simply not necessary. From my experience, the bulk-operation (operation meaning
DELETE) hack will not completely solve your performance issue. If you have one, you are more likely to fix it by improving your architecture, introducing caching, maybe using different technologies and whatnot.
I think what you are concerned with is premature optimization. Yes, it is amazing to have a vision of a system which is blazing fast, but it’s even better having a system done, so you can actually use it.
So while Mr. Fowler does not supply an actual example of bulk updates, it is completely possible if/when needed. He most likely chose not to show an example of that, because the regular sequential approach usually works. That’s my assumption. His reason behind not putting it into the book might be different, but for that you need to ask him directly.
Keep in mind if the is a network between your program and the database, it is very easy to write code that becomes IO bound.
Also, if you are processing transactions you may have to take several trips to the database depending on the mechanism used to track transactions.