Is domain driven design an anti-SQL pattern?

02/11/2022 softwareengineering

I am diving in the domain driven design (DDD) and while I go more deeply in it there are some things that I don’t get. As I understand it, a main point is to split the Domain Logic (Business Logic) from the Infrastructure (DB, File System, etc.).

What I am wondering is, what happens when I have very complex queries like a Material Resource Calculation Query? In that kind of query you work with heavy set operations, the kind of thing that SQL was designed for. Doing those calculations inside the Domain Layer and working with a lot of sets in it is like throwing away the SQL technology.

Doing these calculations in the infrastructure can’t happen too, because the DDD pattern allows for changes in the infrastructure without changing the Domain Layer and knowing that MongoDB doesn’t have the same capabilities of e.g. SQL Server, that can’t happen.

Is that a pitfall of the DDD pattern?

These days, you are likely to see reads (queries) handled differently than writes (commands). In a system with a complicated query, the query itself is unlikely to pass through the domain model (which is primarily responsible for maintaining the consistency of writes).

You are absolutely right that we should render unto SQL that which is SQL. So we’ll design a data model optimized around the reads, and a query of that data model will usually take a code path that does not include the domain model (with the possible exception of some input validation — ensuring that parameters in the query are reasonable).

As I understand it, a main point is to split the Domain Logic (Business Logic) from the Infrastructure (DB, File System, etc.).

This is the foundation of the misunderstanding: the purpose of DDD isn’t to separate things along a hard line like “this is in the SQL server, so must not be BL”, the purpose of DDD is to separate domains and create barriers between them that allow the internals of a domain to be completely separate from the internals of another domain, and to define shared externals between them.

Don’t think of “being in SQL” as the BL/DL barrier—that’s not what it is. Instead, think of “this is the end of the internal domain” as the barrier.

Each domain should have external-facing API’s that allow it to work with all the other domains: in the case of the data storage layer, it should have read/write (CRUD) actions for the data-objects it stores. This means SQL itself isn’t really the barrier, the VIEW and PROCEDURE components are. You should never read directly from the table: that is the implementation detail DDD tells us that, as an external consumer, we should not worry about.

Consider your example:

What I am wondering is, what happens when I have very complex queries like a Material Resource Calculation Query? In that kind of query you work with heavy set operations, the kind of thing that SQL was designed for.

This is exactly what should be in SQL then, and it’s not a violation of DDD. It’s what we made DDD for. With that calculation in SQL, that becomes part of the BL/DL. What you would do is use a separate view / stored procedure / what-have-you, and keep the business logic separated from the data-layer, as that is your external API. In fact, your data-layer should be another DDD Domain Layer, where your data-layer has it’s own abstractions to work with the other domain layers.

Doing these calculations in the infrastructure can’t happen too, because the DDD pattern allows for changes in the infrastructure without changing the Domain Layer and knowing that MongoDB doesn’t have the same capabilities of e.g. SQL Server, that can’t happen.

That’s another misunderstanding: it says implementation details internally can change without changing other domain layers. It doesn’t say you can just replace a whole infrastructure piece.

Again, keep in mind, DDD is about hiding internals with well-defined external API’s. Where those API’s sit is a totally different question, and DDD doesn’t define that. It simply defines that these API’s exist, and should never change.

DDD isn’t setup to allow you to ad-hoc replace MSSQL with MongoDB—those are two totally different infrastructure components.

Instead, let’s use an analogy for what DDD defines: gas vs. electric cars. Both of the vehicles have two completely different methods for creating propulsion, but they have the same API’s: an on/off, a throttle/brake, and wheels to propel the vehicle. DDD says that we should be able to replace the engine (gas or electric) in our car. It doesn’t say we can replace the car with a motorcycle, and that’s effectively what MSSQL → MongoDB is.

If you’ve ever been on a project where the organization paying to host the application decides that the database layer licenses are too expensive, you’ll appreciate the ease of which you can migrate your database/data storage. All things considered, while this does happen, it doesn’t happen often.

You can get the best of both worlds so to speak. If you consider performing the complex functions in the database an optimization, then you can use an interface to inject an alternate implementation of the calculation. The problem is that you have to maintain logic in multiple locations.

Deviating from an architectural pattern

When you find yourself at odds with implementing a pattern purely, or deviating in some area, then you have a decision to make. A pattern is simply a templated way to do things to help organize your project. At this point take time to evaluate:

Is this the right pattern? (many times it is, but sometimes it’s just a bad fit)
Should I deviate in this one way?
Just how far have I deviated so far?

You’ll find that some architectural patterns are a good fit for 80-90% of your application, but not so much for the remaining bits. The occasional deviation from the prescribed pattern is useful for performance or logistical reasons.

However, if you find that your cumulative deviations amount to a good deal more than 20% of your application architecture, it’s probably just a bad fit.

If you choose to keep going with the architecture, then do yourself a favor and document where and why you deviated from the prescribed way of doing things. When you get a new enthusiastic member on your team, you can point them to that documentation which includes the performance measurements, and justifications. That will reduce the likelihood of repeat requests to fix the “problem”. That documentation will also help disincentivize rampant deviations.

The set manipulation logic that SQL is good at can be integrated with DDD no problem.

Say for example I need to know some aggregate value, total count of product by type. Easy to run in sql, but slow if I load every product into memory and add them all up.

I simply introduce a new Domain object,

ProductInventory
{
    ProductType
    TotalCount
    DateTimeTaken
}

and a method on my repository

ProductRepository
{
    List<ProductInventory> TakeInventory(DateTime asOfDate) {...}
}

Sure, maybe I am now relying on my DB having certain abilities. But I still technically have the separation and as long as the logic is simple, I can argue that it is not ‘business logic’

One of the possible ways to solve this dilemma is to think of SQL as of an assembly language: you rarely, if at all, code directly in it, but where performance matters, you need to be able to understand the code produced by your C/C++/Golang/Rust compiler and maybe even write a tiny snippet in assembly, if you cannot change the code in you high level language to produce desired machine code.

Similarly, in realm of databases and SQL, various SQL libraries (some of which are ORM), e.g. SQLAlchemy and Django ORM for Python, LINQ for .NET, provide higher level abstractions yet use generated SQL code where possible to achieve performance. They also provide some portability as to the used DB, possibly having different performance, e.g. on Postgres and MySQL, due to some operations using some more optimal DB-specific SQL.

And just as with high level languages, it is critical to understand how SQL works, even if it is just to rearrange the queries done with above mentioned SQL libraries, to be able to achieve desired efficiency.

P.S. I would rather make this a comment but I do not have sufficient reputation for that.

As usual, this is one of those things that depends on a number of factors. It’s true that there’s a lot that you can do with SQL. There are also challenges with using it and some practical limitations of relational databases.

As Jared Goguen notes in the comments, SQL can be very difficult to test and verify. The main factors that lead to this are that it can’t (in general) be decomposed into components. In practice, a complex query must be considered in toto. Another complicating factor is that be behavior and correctness of SQL is highly dependent on the structure and content of your data. This means that testing all the possible scenarios (or even determining what they are) is often infeasible or impossible. Refactoring of SQL and modification of database structure is likewise problematic.

The other big factor that has lead to moving away from SQL is relational databases tend to only scale vertically. For example, when you build complex calculations in SQL to run in SQL Server, they are going to execute on the database. That means all of that work is using resources on the database. The more that you do in SQL, the more resources your database will need both in terms of memory and CPU. It’s often less efficient to do these things on other systems but there’s no practical limit to the number of additional machines you can add to such a solution. This approach is less expensive and more fault-tolerant than building a monster database server.

These issues may or may not apply to the problem at hand. If you are able to solve your problem with available database resources, maybe SQL is fine for your problem-space. You need to consider growth, however. It might be fine today but a few years down the road, the cost of adding additional resources may become a problem.

Is that a pitfall of the DDD pattern?

Let me first clear a few misconceptions.

DDD is not a pattern. And it doesn’t really prescribe patterns.

The preface to Eric Evan’s DDD book states:

Leading software designers have recognized domain modeling and design as critical topics for at least 20 years, yet surprisingly little has been written about what needs to be done or how to do it. Although it has never been formulated clearly, a philosophy has emerged as an undercurrent in the object community, a philosophy I call domain-driven design.

[…]

A feature common to the successes was a rich domain model that evolved through iterations of design and became part of the fabric of the project.

This book provides a framework for making design decisions and a technical vocabulary for discussing domain design. It is a synthesis of widely accepted best practices along with my own insights and experiences.

So, it’s a way to approach software development and domain modeling, plus some technical vocabulary that supports those activities (a vocabulary that includes various concepts and patterns). It’s also not something completely new.

Another thing to keep in mind is that a domain model is not the OO implementation of it that can be found in your system – that’s just one way to express it, or to express some part of it. A domain model is the way you think about the problem you are trying to solve with the software. It’s how you understand and perceive things, how you talk about them. It’s conceptual. But not in some vague sense. It’s deep and refined, and is a result of hard work and knowledge gathering. It is further refined and likely evolved over time, and it involves implementation considerations (some of which may constrain the model). It should be shared by all team members (and involved domain experts), and it should drive how you implement the system, so that the system closely reflects it.

Nothing about that is inherently pro- or anti-SQL, although OO developers are perhaps generally better at expressing the model in OO languages, and the expression of many domain concepts is better supported by OOP. But sometimes parts of the model must be expressed in a different paradigm.

What I am wondering is, what happens when I have very complex queries […]?

Well, generally speaking there are two scenarios here.

In the first case, some aspect of a domain really requires a complex query, and perhaps that aspect is best expressed in the SQL/relational paradigm – so use the appropriate tool for the job. Reflect those aspects in your domain thinking and the language used in communicating concepts. If the domain is complex, perhaps this is a part of a subdomain with it’s own bounded context.

The other scenario is that the perceived need to express something in SQL is a result of constrained thinking. If a person or a team has always been database oriented in their thinking, it may be difficult for them, just due to inertia, to see a different way of approaching things. This becomes a problem when the old way fails to meet the new needs, and requires some thinking out of the box. DDD, as an approach to design, is in part about ways to find your way out of that box by gathering and distilling the knowledge about the domain. But everybody seems to ignore that part of the book, and focuses on some of the technical vocabulary and patterns listed.

Sequel became popular when memory were expensive, because relational data model provided possibility to normalise your data and effectively store it in the file system.

Now memory is relatively cheap, so we can skip normalisation and store in in the format we use it or even duplicate a lot of same data for sake of speed.

Consider database as simple IO device, which responsibility to store data in the file system – yes I know it is difficult to imagine it, because we wrote plenty of applications with important business logic written into SQL queries – but just try to imagine that SQL Server is just another printer.

Would you embedded PDF generator into printer driver or added a trigger which will print log page for every sales order printed out of our printer?

I assume the answer will be no, because we don’t want that our application are coupled to the specific device type (not even talking about efficiency of such idea)

In 70’s- 90’s SQL database were efficient, now? – Not sure, in some scenarios asynchronous data query will returns required data faster than multiple joins in SQL query.

SQL wasn’t designed for complicated queries, it were designed for storing data in efficient way and then provide interface/language to query stored data.

I would say building your application around relational data model with complicated queries is abuse of database engine. Of course database engine providers are happy when you tightly coupling your business to their product – they will be more than happy to provide more features which make this bound stronger.

LEAVE A COMMENT Hủy