It’s about tests not builds
The focus on builds, which is reinforced by Fowler’s paper, subtly corrupts a practice that should be founded on good, fast testing. Tests need to take center stage, and the build needs to be considered just a simple test of compilation.
Where Build Focused CI Practices Fail
To be truly successful in CI, Fowler asserts that the build should be self-testing and that these tests include both unit and end-to-end testing. At the same time, the build should be very fast — ideally less than ten minutes — because it should run on every commit. If there are a significant number of end-to-end tests, executing them at build time while keeping the whole process under ten minutes is unrealistic. Add in the demand for a build on every commit, and the requirements start to feel improbable. The options are either slower feedback or the removal of some tests.
In the 2006 rewrite, Fowler addresses the problems of long-running tests by suggesting a “staged build.” In a staged build, fast tests are run in “commit builds” and slower tests are run as part of “secondary builds.” Commit builds would then provide quick feedback on the most important issues while the secondary builds execute additional tests to detect less obvious integration errors. This is certainly an improvement over the difficult situation previously discussed, but in practice the many types of build required by this approach are problematic.
How does one ensure that the sources a secondary build uses match those that first passed a commit build? How does one relate the results of various slow builds to each other and the results of the commit build? These types of problems tend to be difficult to solve, often requiring a good deal of cleverness, the use of excessive source control labeling\tagging, and at best remain only partly solved. Further, in order to practice CI speed is of the essence; however, the staged build approach reruns the same compilation several times when running different tests with each build. The extra building wastes resources that could be running tests.
Other complications arise when CI is scaled beyond the trivial project. In an enterprise environment, many tests that detect integration problems do not reside in source control, where build scripts can easily launch them. The tests may live in enterprise testing tools such as Borland’s Silk Central or HP’s Quality Center. A QA team may have testers devoted to testing recent builds and ensuring that the new functionality actually works. It’s difficult to have build scripts run these outside systems, and manual testing simply does not fit into the scope of an automated build script.
Added to this, the obsession with making everything a build hurts traceability, limiting what can be done and wasting time. Unfortunately, a number of tools are built around this principal of automating builds of various types. As one would expect from the CI community, these automated build systems typically — and inaccurately — call themselves “Continuous Integration Servers.” A focus on build restricts these tools to only providing proper CI support for trivial projects. Fowler’s paper likely contributed to this unfortunate state of affairs.
A Better Way
By discarding build as a focus, what remains are integrations and tests of those integrations. In practice, developers continue to integrate many times a day, and tests are run to see if errors were introduced during those many integrations. Each set of tests is run as often as there is something new to test and resources are available.
The first and most fundamental test is the compile test. On every commit, a process gets the source code and compiles it. By adding the execution of some additional fast tests, this process suddenly looks very much like Fowler’s “commit build.” It’s the last build necessary though, and so for our purposes it will just be “the build.” When the build is done, the team is notified of its status and of any critical problems that need to be addressed.
The various other slower tests still need to happen. Instead of several “secondary builds”, the other processes are simply functional tests, stress tests and deployments to manual testing environments. Tests are not builds; deployments to QA are not builds; and neither should be called builds.
In order to perform these various tests, built software is still required. Fortunately, the build creates the software we want to test every time there is a commit. If there is an hour-long functional test process, it merely needs to be able to pick up the most recently built software that has passed fast tests, move it to the functional testing environment, and run the longer tests on the artifacts. That can happen every hour while our fast build happens several times an hour.
If the results of those tests — and any other tests run against this one build — are collected together one can get an increasingly complete view of the quality of the software, and can be ready to correct faults caused by new code. To do this well, the system needs to be able to reach beyond the confines of tests stored in source control and run tests stored in enterprise QA systems.
As this understanding of CI permeates the community, there should be an increasing number of tools that provide facilities to run secondary test processes against applications built earlier in the day or week. And this is happening right now. My employer, Urbancode, has used the test approach to CI in its CI Server for some time now, and other tools have recently started to adopt this strategy as well. Teams using this new breed of CI Server no longer need to juggle many builds. The freedom to act on existing builds allows the CI server to be used as the basis for a release management system that helps move the software out of the testing environments and into production.
All of this though, is only possible by shifting the focus of CI theory and tooling from build to test. As such, a revised definition would be:
“Continuous Integration is a software development practice where members of a team integrate their work frequently. Integrations are verified by tests (including build) to detect integration errors as quickly as possible.”

Pingback: Maslow's Hammer: The curse of tools | UrbanCode Blogs