So I’m responsible for maintaining and enhancing a PHP codebase that runs to around 1,100 files and 120,000 lines of code.
In general, it’s not terrible, although since it was written over some 10+ years starting in the PHP 4 era, some parts are more terrible than others. However, this entire codebase has a grand total of two unit tests, one of which I wrote yesterday.
All the persistent data is stored in a database.
If the codebase has one big, obvious problem, it’s the giant 1700-line “god object” that sits in the middle of everything and acts as combination class factory, database abstraction, and session data storage.
The god object then has various different factories depending on whether it is being called from a web page (used by end users) or a command line (used by operators) or an operational script responsible for interfacing with the rest of the world to actually make things happen based on what users & operators do.
So it looks a little like this mockup example: https://gist.github.com/jdavidlists/f2abeaefa58a9a823f71bb42fbd3f49d
Where this turns ugly is that in addition to the Houses table in the database, there are another dozen or so house-related tables (Doors, Windows, Bookcases, Books, Rooms, Cabinets, Furniture, etc.), all with their own WhateverTable and WhateverRow classes. And there’s also a Streets table. And a Sidewalks table. And a Driveways table (and of course a house can have more than one driveway, and a driveway can be built where there’s not a house yet, and a house’s driveway can connect to more than one street, and a driveway might connect to more than one house). And there’s a Cars table. And an Owners table. And Streets, Driveways, Cars, and Owners all have another dozen or so related tables.
What I’m getting at here is that the system being modeled is an existing real world system that is insanely complex and full of interdependencies, many-to-many relationships, special cases and exceptions that have to be accommodated.
For example, to add a Door to the House, you’ll wind up:
- Creating a BuildingPermit object.
- Creating a Door object.
- Possibly creating a Sidewalk object. (Or a Driveway object if it’s a GarageDoor subclass.)
- Adding a Door-to-House relationship.
- Adding a Door-to-Sidewalk relationship.
- Possibly adding Sidewalk or Driveway relationships to one or more Streets.
- Modifying (or possibly adding) a Room object.
- Wait for BuildingPermit to reach state Approved.
- Issue asynchronous command to have the actual door built.
- Wait for that to finish.
Of course any of these things might fail, which means they all have to be rolled back, so now your add-a-door method (in whichever class you decide all this actually belongs) at the top of the stack has to be aware of transaction safety at the database level all the way at the bottom. And the is-this-valid tests include some 2 KiB SQL queries with 8 joins, 2 subqueries, and a HAVING condition that make the tester in you go home early, hide in a closet, and sob.
Currently, this is handled by passing the God object around all over the place. The theory is that each object can do its bit and pass off to the next object. The practice is that every object is now aware of and dependent on every other object. There’s almost no such thing as encapsulation because the relationships between the real-world objects being modeled are so tight and interdependent.
All the dependency injection and inversion of control stuff I see says stuff like “If your House needs a Door, pass the Door in the constructor and have a HouseFactory. That way you can pass a mock door or NULL for testing.” Which is great for a simple one house : one door model. But in this case, the user (or the factory) has no idea how many doors the house might have, as that’s buried in the database.
Given the above, it’s easy to see how things evolved to their current state over a long period of time. But what’s less clear is what to do about it now.
The good news is this code works remarkably well, all things considered. The bad news is that doing a total rewrite is out of the question.
It is possible to make big changes, eventually. It’s just really slow and requires backwards compatibility code loaded down with “trigger_error( ‘Don’t do this anymore.’, E_USER_DEPRECATED );” calls. Using this approach, I’ve already managed to consolidate the three nearly-but-not identical database abstractions down to just one, which was a big project and a big win, with minimal disruption.
Now I’d like to figure out where to go from here. Over time I would like to make things better and not worse. This seems likely to come in two flavors:
1) Modifications that make testing easier. (Read: possible.) And, of course, creating the tests.
2) When making enhancements, push things that are being touched in the right direction. I.e. slowly phase out use of the God object in favor of whatever the right answer is. (While making the God object pass things through to the new model so existing code continues to work.)
So, although it’s not an option, if this was being written from scratch today, what would be the right approach for such a complex, interdependent, database-backed model? And then the real question is: what’s the best way to safely ease things in that direction over time?
Note: Thusfar, experiments with registry-based IoC approaches don’t fare well, because they wind up registering hundreds of classes on every page load, of which only a handful wind up being used, and that negatively affects performance. At least with the God object, the thing sits in the opcache all day long, ready to go.
Thanks for any suggestions!
5
The first question that comes to my mind when reading your question is not how one could refactor such a beast (»1,100 files and 120,000 lines of code.«), it is more »why on earth would anybody do that?«
I know the strive for the perfect design and I too love clean code. But ont the other hand, I have only 8h a working day and there are funnier things to do than that. More than once I arrived at a job, where a complete mess awaited me. So, I know that »OMG-WTF! let’s clean this!« feeling.
But why do you want to do it?
There are possible solutions that are way better than refactoring such a behemoth:
I) If the core runs, you are able to make clean extensions. If there is much ceremony and boilerplate involved in getting things running, you could stuff things behind a Facade – like cleaning a room when putting everything in drawers. So you have done something valuable: You delivered business value (a new feature – with clean code) and have some feeling of tidyness.
II) If the core doesn’t allow extensions in one or the other direction, there are two decisions to make:
II a) How expensive is it to make partial changes to the core in order to deliver value with justifiable effort. Some minor changes here and there making your life easier is perhaps enough to get things done.
It is never justifiable to completey refactor 120k LOC.
II b) If you come to the conclusion, there is an unjustifiable amount of work to do, you should play with the cards on the table and get a team starting a complete rewrite. Sometimes this is the way to go.
But which direction you want to go, you have to make a business based decision.
That is all, I could do for you at the moment.
2
You’re responsible for enhancing and maintaining the codebase. Is that the only responsibility that you have ?
If it is, then the game is pretty simple.
Start small. DO NOT TOUCH THIS GOD. It’s a God for a reason. It is used everywhere, it is too massive and too powerful, you won’t be able to test it anyway and any change could break ANY part of your codebase.
Your real goal is not to make this class cleaner. This class is your enemy. Make it useless by removing calls to this class in small parts of the application, and replace it with clean and tested code.
This will take a lot of time, but at some point this class won’t even be used anymore. You’ll be able to simply remove it.
So, simple strategy :
– Very small part of the application using the god object : target acquired.
– Make everything possible to test it. Mock the god object, inject it.
– Now inject a better, clean and tested implementation of the god object in this small part.
– Rinse and repeat with other small parts.
– Remove the god object.
This should be a pretty huge improvement already.
3
I haven’t done a lot of coding with PHP, but generally
There are few steps you can start with,
-
Create interfaces for each responsibility of the God Object and alter it to implement these interfaces. This is a minimal risk as there are no actual changes done in logic, just adding interfaces.
-
When you want to modify a section where God Object is used, you could change the God Object reference to use the appropriate interface(s), instead of God Object. This will remove direct dependencies of God Object and it will be more easier to mock and test.
-
At some point when there are minimal references to the actual God Object, you can start to split up the object such that one per interface.