I am attempting to refactor a component of a large project, which currently has a lot of dependencies to global state of the project environment. The goal (for my taste) is a “SOLID” architecture, where everything can be properly tested.
Sometimes the dependency may simply be a string or value: A specific directory, a machine name, sometimes maybe an array.
However, the dependency can not be injected as a simple value at construction time, because the value might not be available yet.
The value might be coming from a dependency injection container, from the configuration system, etc. Currently the class itself calls to legacy functions or static methods of these subsystems, to get the respective values. This is bad because of the global state dependency, and because the component knows too much about these subsystems.
Idea
So my idea was to create one “data provider” interface per dependency, with different implementations for “real” and “fake”. An interface expresses one of the needs of a component that depends on it, not more. Thus, most of them have just one method. Some are being reused, but not a lot.
The main functionality is also split into sub-components with interfaces. Some of them have cache or buffer decorators. One part is split into a chain of processors, which implement a “(Name)ProcessorInterface”.
Problem: Too many classes and interfaces.
This works great, and everything is super-testable.
I use the same style all the time in my own projects, and would not hesitate to do it again here.
But here it is not my choice. This is a huge OSS project, with a large community of developers, which I need to convince, if I want to do it this way. And the rest of the code, for the most part, does not look like this.
The number of classes and interfaces is getting very high.
And a lot of them are just “internal”, so they are not meant as part of a public API.
Here is one example of classes I am/was going to add. This is too big to post here. The original was simply a call to drupal_get_profile()
, which of course makes testing a lot more difficult.
Either I am wrong, or I need a better way to sell this.
Questions
To summarize the main questions:
- Is it a problem if some interfaces are not meant for the “public API”, but for “internal” use only?
- Is there a problem there are too many classes? Should complexity be measured in number of classes, or average size of each individual class?
- Is there a way to reduce the number of classes and still have the benefits of a “solid” architecture?
I have my own answer for most of these, but I have the feeling this is not sufficient to convince anyone.
More thoughts
An alternative would be to have more “generic” interfaces, e.g. “StringProviderInterface” instead of “ProfileNameInterface”. But then I would lose the semantic specificity. Just because they all return strings, does not make them exchangeable.
Maybe the interfaces will be reused more often, if more code transforms into this style.
Update
As the discussion is going on, a major and repeated concern that I hear is that every new interface needs to be backwards compatible, even if it is meant for something “internal”. I technically agree with this, but I think the same could be said about classes and public methods. So I wonder if this is really a valid concern. Maybe the real problem is to have awkward one-off interfaces dangling around.
8
Thus, most of them have just one method.
This looks smelly even in statically checked, object oriented languages such as Java or C# which don’t discourage creating more and more interfaces and classes. In languages such as PHP, Ruby or Python, this looks even wronger. An interface and two classes for just a string? Seems like too much architecture and not much KISS.
Some are being reused, but not a lot.
This doesn’t look right either. If two interfaces have nothing in common except the type they encompass, keep them strictly separate. Reusing works well for actual types. Say Distance
type can be used to store the distance between two Building
s, but also as a Length
property of Wall
class. But this doesn’t work well for dependency injected values, for example when injecting the database connection string and the address of the cache server.
Keep similar information together
For instance, if the application has configuration with a few dozen values, there is nothing wrong in keeping them together within a AppConfiguration
class and use a IAppConfiguration
interface. The AppConfigurationStub
could let you define specific values for the tests while keeping other undefined/unset.
If a specific class of your application needs to know a connection string to the database as well as the maximum number of products to show on the home page, give it the full application configuration object, not just those two pieces of information. It doesn’t matter that the class doesn’t need to know the e-mail address of the administrator or the list of languages available: if the class doesn’t need this information, it simply won’t access it.
Of course, if you manipulate confidential data, things are different. For instance, IUserConfiguration
interface may point to the color preferred by the user or the number of products in the cart—information which is not particularly sensitive, but also the credit card number of the e-mail address—information you should keep secure. Providing a common interface to both sensitive and non-sensitive information would be a mistake; instead, you could have a separate interface with person’s personal information (e-mail, phone number, birth date, etc.), and a third interface with highly sensitive information (credit card number).
Another exception is when, very logically, a class couldn’t be interested by anything but one field of an object. For instance, classes from data access layer (DAL) may need to know exclusively the connection string, and shouldn’t bother about the maximum number of products to show on home page (since this is business logic, and will be passed later to the specific method of DAL under a form “give me please 20 products, without being concerned where I’ll display them.”) In this situation, DAL shouldn’t even know anything about the aspect of application configuration or even application per se. It can be a service. Or a cron job. They could have their own configuration, which has a very specific structure.
KISS
If a class needs a scalar value (such as a string or an array), don’t wrap it in a class/interface/stub structure just for the sake of dependency injection. Dependency injection is expected to make your life easier; it is not expected to force you to over-architecture your system.
Inject information just-in-time
You mentioned a problem of not knowing some pieces of information when initializing a class. If you don’t know it at this stage, very probably the class being initialized doesn’t need it either during the initialization. If it need it sometimes later, you may postpone dependency injection until you actually call the method which needs the information.
For instance, instead of:
$productsData = new ProductsData($region);
...
$productsData->loadProducts();
you can do:
$productsData = new ProductsData;
...
$productsData->loadProducts($region);
Sometimes, you can’t do that. For instance, if most methods of a class require a piece of information, you’ll rather give this information once when initializing the object rather than every time you call the methods of this object.
In this case (and if you need to pass a single value such as a string or an array), you can use a construct similar to C#’s Lazy<T>
, which is a way to tell the class “hey, I don’t know the exact value right now (or don’t want to compute it right now, because it’s computationally expensive and you may not even need it), but I know how to get it when and if, later, you actually need it”. You can reimplement the pattern in all languages where functions/methods are first-class citizens.
8