I’d like to introduce test-driven development practices in an established legacy product. As with most legacy products, we could use quality improvement via safe gradual refactoring/code coverage and TDD for both new features and defects could go a long way.
However, there are a few challenges: in some cases writing tests requires large refactoring in itself (i.e. no testable interfaces, or arcane templated dependencies). There is also the human factor: TDD is a big shift from the current process.
Hence, I’m looking for experience sharing – has anyone had a similar situation with introducing TDD in a legacy product? If so, what worked and what did not?
TDD does not really apply to maintenance. Instead, Test Driven Development is an iterative model that focusses on the development or implementation phase. Normally, the software life cycle would first produce designs, which are the implemented as code, which is then tested & released. TDD reorders these for each iteration: we write tests first, implement enough code to make them pass, the refactor the code to make it resemble some sensible design.
You can still use TDD when creating or changing functionality. This is especially straightforward for new code. How does that code interact with the rest of the application? We can capture those interactions and turn them into testable seams. When this is not possible or sensible, either due to the required effort or the insane architecture that would result, we cannot apply TDD with unit tests. Instead, we will have to phrase our tests in terms of larger components or even in terms of the whole system. Every system has at least one seam: the system boundary. Of course such tests become incredibly cumbersome, but nothing is completely untestable.
On a social level, it is difficult to enforce TDD. TDD is a possible structure for a personal productivity cycle, and is not really applicable for a whole team.
A red-green-refactor cycle might be as quick as a few seconds to minutes, so it is not possible to enforce a process here. If developers want to train their personal TDD skills, pair programming can be effective. The two devs can help each other stay on track and stick to the red-green-refactor cycle instead of writing code they think might be needed in the future.
Aside from being unenforcable, TDD might not be applicable for every dev. E.g. I prefer to write code first, then methodically test & refactor it until all edge cases are covered. The final result – functionality with excellent test coverage – is indistinguishable from code written with TDD.
If the whole team wants to commit themselves to increasing test coverage, using a CI server can be valuable. After each checkin, the whole test suite is executed. It would be possible to mark the build as failed if the test coverage goes down (i.e. new code was added without tests). However, such a metric can be easily gamed, and test coverage is not a measure of test quality.
More often than introducing TDD practices into a legacy system, we’re interested in bringing a legacy system under test. As you have experienced, this can be quite difficult when the application was not designed for testability. But this is just about testing, not about TDD. I have struggled with this as well, but have found that it is possible to generate the necessary seams through simple, mechanical transformations that do not affect the correctness of the code being worked on.
In a nutshell: In the code we are trying to test, identify relevant dependencies on external functions or objects. We can then define an interface for these, and allows callers of our code to supply their own implementation for these interfaces. However, we supply default values that forward all calls to the current behaviour. This refactoring maintains the exact behaviour, and maintains source-compatibility: while the signature of the code under test is changed, no calling code needs to be updated. A Python example:
def system_under_test(a, b): ... result = external_function(b, c) ...
This can be safely refactored to:
def system_under_test(a, b, external_function_service=None): if external_function_service is None: external_function_service = external_function ... result = external_function_service(b, c) ...
If the external dependency is not a function but a class or object, the default implementation would be an Adapter that connects the current implementation to the new interface we introduced. I did a more thorough write-up on this approach on my blog, though parts of it are C++-specific: Extract Your Dependencies – Making your code ready to be tested.
Allocate resources to writing tests for old code and refactoring the untestable bits. This is going to be a very slow and gradual process, but that’s going to be slow no matter what you do since you want to make the code more stable, not less, that’s why you’re adopting TDD in the first place and if you break too many things at once that’s going to slow things down even further.
Keep note of what parts of the code tend to have more bugs or change more frequently with new features and just start with those. That’s where you’ll see the most impact by adding tests and refactoring. That cryptic module that nobody really understands, but that doesn’t cause much problems and hardly ever changes can stay put for a while.
This will be a self-reinforcing process, since the more tests you add, the easier refactoring will be and vice versa.
Set targets for each iteration, and track progress. You want to have less untested code every week, it doesn’t have to be much less, just less. You still have other work to do on the project after all.
Have a team practice where no new code will pass code review without proper tests. Take the opportunity to include static analysis tools such as linters into your process and stick to a strict style guide for new code.