Why would you run unit tests on a CI server?
Surely, by the time something gets committed to master, a developer has already run all the unit tests before and fixed any errors that might’ve occurred with their new code. Isn’t that the point of unit tests? Otherwise they’ve just committed broken code.
Surely, by the time something gets committed to master, a developer has already run all the unit tests before and fixed any errors that might’ve occurred with their new code.
Or not. There can be many reasons why this can happen:
- The developer doesn’t have the discipline to do that
- They have forgotten
- They didn’t commit everything and pushed an incomplete commit set (thanks Matthieu M.)
- They only ran some tests, but not the whole suite (thanks [nhgrif])
- They tested on their branch prior to merging (thanks [nhgrif] * 2)
But the real point is to run the tests on a machine that is not the developer machine. One that is configured differently.
This helps catch out issues where tests and/or code depend on something specific to a developer box (configuration, data, timezone, locale, whatever).
Other good reasons for CI builds to run tests:
Testing on different platforms other than the main development platforms, which may be difficult for a developer to do. (thanks [TZHX])
Acceptance/Integration/End to End/Really long running tests may be run on the CI server that would not be run on a developer box usually. (thanks [Ixrec])
A developer may make a tiny change before pushing/committing (thinking this is a safe change and therefore not running the tests). (thanks [Ixrec] * 2)
The CI server configuration doesn’t usually include all the developer tools and configuration and thus is closer to the production system
CI systems build the project from scratch every time, meaning builds are repeatable
A library change could cause problems downstream – a CI server can be configured to build all dependent codebases, not just the library one: What’s the point of running unit tests on a CI server?)
: What’s the point of running unit tests on a CI server?
As a developer who doesn’t run all the integration and unit tests before making a commit to source control, I’ll offer up my defense here.
I would have to build, test and verify that an application runs correctly on:
- Microsoft Windows XP and Vista with Visual Studio 2008 compiler.
- Microsoft Windows 7 with Visual Studio 2010 compiler.
- Oh, and the MSI builds for each of those.
- RHEL 5 and 6 with 4.1 and 4.4 respectively (similarly CentOS)
- 7 soon. Woop-de-woop.
- Fedora Workstation with GCC for last three recent versions.
- Debian (and derivatives like Ubuntu) for last three recent versions.
- Mac OSX in last three recent versions.
- And the packages (rpm, dmg, etc)
Add in the Fortran (with both Intel and GNU compilers), Python (and it’s various versions depending on OS) and bash / bat script components and, well, I think you can see things spiral out
So that’s sixteen machines I’d have to have, just to run a few tests a couple of times a day. It would be almost a full time job just to manage the infrastructure for that. I think almost anyone would agree that’s unreasonable, especially multiplying it out to the number of people in the project. So we let our CI servers do the work.
Unit tests don’t stop you committing broken code, they tell you if they know you’ve broken something. People can say “unit tests should be fast”, and go on about principles and design patterns and methodologies, but in reality sometimes its just better to let the computers we’ve designed for repetitive, monotonous tasks do those and only get involved if they tell us they’ve found something.
You’d think so wouldn’t you – but developers are human and they sometimes forget.
Also, developers often fail to pull the latest code. Their latest tests might run fine then at the point of check-in, someone else commits a breaking change.
Your tests may also rely on a local (unchecked-in) resource. Something that your local unit tests wouldn’t pick up.
If you think all the above is fanciful, there is a level above CI (on TFS at least) called Gated where builds that have failing tests are shelved and aren’t committed to the code base.
Apart from the excellent Oded answer:
- You test the code from the repository. It may work on your machine with your files… that you forgot to commit. It may depend on a new table that does not have the creation script (In liquibase for example), some configuration data or properties files.
- You avoid code integration problems. One developer downloads the last version, creates unit and integration test, adds code, pass all test in his machine, commits and push. Another developer has just done the same. Both changes are right on their own but when merged causes a bug. This could be the repository merging or just that it is not detected as a conflict. E.g. Dev 1 deletes file that was not used at all. Dev 2 codes against this file and tests without Dev 1 changes.
- You develop an script to deploy automatically from the repository. Having an universal building and deploying script solves a lot of issues. Some developer may have added a lib or compiling option that is not shared by everybody. Not only does this save you time, but more importantly, it makes the deployment safe and predictable. Furthermore, you can go back in your repository to version 2.3.1 and deploy this version with a script that works with this version. It includes database objects like views, stored procedures, views, and triggers that should be versioned. (Or you won’t be able to go back to a workable version).
- Other tests: Like integration, performance and end to end tests. This can be slow and might include testing tools like Selenium. You may need a full set of data with a real database instead of mock objects or HSQL.
I once worked on a firm that had a lot of bugs on deployment due to the merging and deployment process. This was caused by a weird propietary framework that made testing and CI hard. It was not a happy experience to find that code that worked perfectly on development didn’t arrive right to production.
by the time something gets committed to master
I usually set up my CI to run on every single commit. Branches don’t get merged into master until the branch has been tested. If you’re relying on running tests on master, then that opens a window for the build to be broken.
Running the tests on a CI machine is about reproducible results. Because the CI server has a known clean environment pulled from your VCS, you know that the test results are correct. When running locally, you could forget to commit some code needed for them to pass, or have uncommitted code that makes them pass when they should be failing.
It also can save the developers time by running different suites in parallel, especially if some are slow, multi-minute tests that aren’t likely to be run locally after each change.
At my current work our production deployment is gated on CI passing all tests. The deploy scripts will prevent deployment unless they’re passing. This makes it impossible to accidentally forget to run them.
CI being part of the workflow takes burden off of developers as well. As a developer, do you usually run a linter, static analyzer, unit test, code coverage, and integration test for every single change? CI can, completely automatically and without needing to think about it – reducing decision fatigue.
By the time something gets committed to master, a developer should have already run all the unit tests … but what if they haven’t? If you don’t run the unit tests on the CI server, you’ll not know until someone else pulls the changes to their machine and discovers the tests just broke on them.
In addition, the developer may have made a mistake and referenced a local resource specific to their machine. When they check in the code and the CI run fails, the problem is immediately identified and can be corrected.
Assuming (contrary to other answers) that developers are quite disciplined and do run unit tests before committing, there can be several reasons :
- running unit tests can take long for some special set up. For example, running unit tests with memory checker (like valgrind) can take much longer. Although all unit tests are passing, memory check can fail.
- the result is not that important for some special settings – for example, running unit tests to check the code coverage requires special compiling flags. For normal developers, code coverage is not that important – it is more for people taking care that code maintains certain quality, like team leads.
It is possible to imagine cases when the change A does not break the test, and change B does not break the test, but A and B together do. If A and B are made by different developers, only CI server will detect the new bug. A and B may even be two parts of the same longer sentence.
Imagine a train driven by the two locomotives A and B. Maybe one is more than enough and this is the fix to apply. However if the two “fixes” are applied removing both, the train will not move.
Also, not all developers run all Unit tests, while most good developers do.
Let’s ask an equivalent question:
Why would you build the code on a CI server?
Surely, by the time something gets committed to master, a developer
has already built the code before and fixed any errors that might’ve
occurred with their new code. Isn’t that the point of building code?
Otherwise they’ve just committed broken code.
The are several reasons for doing CI, but the main point of CI is to get an idea what the state of the code is over time. The main benefit (out of several) this provides, is that we can find out when the build breaks, figure out what broke it, and then fix it.
If the code is never broken, why do we even use CI? To deliver builds for testing, nightly builds would be good enough.