I often come across this dilemma in my own code and wondered if there is a term for this and if there is a single solution to it. Let me illustrate it with a pseudocode example of making a table from some book info:
pure option
make_table(bi, ab, pd): [p]
t = make_table_from_book_info(bi) [p]
t = add_author_bios_to_table(t, ab) [p]
t = add_publisher_details_to_table(t, pd) [p]
return t
bi = get_book_info()
ab = get_author_bios()
pd = get_publisher_details()
t = make_table(bi, ab, pd) [p]
advantage: great separation of i/o code and pure [p] functions
disadvantage: no modules — if you want to e.g. delete the publisher stuff, you have to do it in two places.
modular option
get_book_info_and_make_table():
bi = get_book_info()
t = make_table_from_book_info(bi) [p]
return t
get_author_bios_and_add_to_table(t):
ab = get_author_bios()
t = add_author_bios_to_table(t, ab) [p]
return t
get_publisher_details_and_add_to_table(t):
pd = get_publisher_details()
t = add_publisher_details_to_table(t, pb) [p]
return t
t = get_book_info_and_make_table()
t = get_author_bios_and_add_to_table(t)
t = get_publisher_details_and_add_to_table(t)
advantage: modules
disadvantage: not so great separation of i/o code and pure [p] functions
I like having these blocks in my code (pure option), block 1 input and block 2 transformations (and block 3 output). I can cache state after block 1 while developing, but maybe that advantage goes when away when I put in my unit tests earlier. And maybe having a pure make_table
(pure option) to unit test is not such a big plus since there are all these extra parts that may or may not be in the table.
Perhaps the modular option only makes sense if its code is in a separate file. Its unit tests would also be in a separate file then.
Looking forward to your thoughts
2
This is what I call a “matrix problem”: we can structure our system along two or more axes. Here, we have one axis relating to features (book info, author details, publisher info), and an axis relating to different steps or layers (I/O to load info, assembling the table). Our actual code will fill the full matrix of features × steps combinations.
But textual programming languages are linear, so we have to somehow sort this 2D-matrix of code elements. On a fundamental level, both axes are equivalent and we can freely choose what to do. Both of your solutions are perfectly valid.
But there are trade-offs for organizing the code one way or another.
First, let’s consider organizing by feature, which you call the “modular option”.
- This keeps all of the code relating to one feature together, making it easy to see the entire data flow for that feature. It is easy to add new features, or to change the behaviour of one feature since they are well isolated.
- However, it is difficult to make changes to cross-cutting concerns, such as changing how data is loaded and how the table is assembled. To perform such changes, we would have to touch all modules/features.
- While it is easy to perform tests for each feature in isolation, those tests will now generally have to be end-to-end tests. In your example, testing the table update would probably also involve I/O.
This strategy is somewhat common across the software stack, for example with microservices architectures, micro-frontends, or components in React/Vue. Compare also the concept of “bounded contexts” in domain-driven design.
Alternatively, let’s consider organizing by layer, which you call the “pure option”.
We group code one the same level together, even if it relates to different features.
This completely flips the pros and cons.
- It is now easy to change things on one level, for example changing a database technology, changing the user interface, or in your case changing the table structure.
- However, it becomes more difficult to work on individual features. Adding a new feature or modifying an existing one will require changes across different layers.
- We can easily write unit tests that exercise each layer in isolation. However, it becomes more difficult to test features in and end-to-end manner, since it’s not clear which functionality of the lower layers is depended upon.
This strategy is extremely common. We see this separation in the original MVC architecture, in many design patterns, in the classic layered architecture (presentation – business – persistence – database), in the Clean/Hexagonal/Onion architecture, and in concepts like “functional core, imperative shell”.
In general, you can choose by considering which changes are likely. Are you likely to change the UI without changing the business logic, or vice versa? Consider organizing by layer. Or are you more likely to keep the code in each layer stable, but you want to add more features easily? Then organizing by feature makes more sense.
But these approaches are not complete opposites – you should draw module boundaries wherever they are appropriate. For example, a large web application might separate the frontend/UI from the backend (organized by layer), but divide the backend into separate microservices (organized by feature).
In general: things that change together should be close to each other. The fancy word for this is “cohesion”, compare also some interpretations of the “single responsibility principle”.
Ultimately, the important point is not to follow some architecture for the sake of the architecture, but to look for seams in your codebase where it is appropriate to decouple modules.
The importance of these architectural principles also varies with the scale of the system.
-
In short examples, it just doesn’t matter.
-
In larger programs, consider the above heuristics about the direction of change.
-
But architecture becomes more important when the software system is maintained by a team of people, or even by multiple teams. At scale, the software architecture and the organization’s structure will likely mirror each other (→ Conway’s Law). If individual teams should be able to deliver value independently, this will likely require some organization by features, for example by adopting a microservice architecture.
1
This is about finding the appropriate separation of concerns:
- In the “pure” approach, you decide to separate the io concern from the internal processing domain. You could could change the database representation with minimal impact on the internal processing.
- In the “modular” approach, you prefer to group the operation by type of data. You could change the data to be processed with minimal impact on the other modules.
Whatever your choice, you will facilitate some changes and make others more difficult with more impacts on other components.
Ideally you would separate the concerns further: separate the io from the rest, and separate the different data and corresponding internal logic/behavior. Something you could do with an obtect oriented polymorphic design or a generic design, combined with dependency injection. But the more granular separation of concerns would imply a higher complexity. So in the end, it’ll be up to you to find the best balance for your project.