Chapter 1. What's the point of Test Driven Development?

One must learn by doing the thing; for though you think you know it, you have no certainty, until you try.

--Sophocles

Software Development as a Learning Process

A worthwhile software project is attempting something that nobody has done before (or at least that nobody in the organisation has done before). Something will be different: the people involved, the application domain, the technology being used, or (more likely) a combination of these. In spite of the best efforts of our discipline, all but the most routine projects have elements of surprise. Interesting projects, the ones that are likely to provide the most benefit, usually have a lot of surprises.

As developers, we often don't completely understand the technologies we're using. We have to learn how they work while we complete the project. Even if we have a good idea of how they work, new applications force us into unfamiliar corners of frameworks that we use, and a system that combines many components becomes too complex for any one person to understand all of its possibilities.

For customers and end-users, the experience, if anything, is worse. The process of building a system forces them to look at their domain more closely than they have before, leaving them to negotiate and codify processes that, until now, may have been based on convention and experience.

Everyone involved in a software project has to learn as it progresses. For the project to succeed, they have to work together just to understand what they're supposed to achieve, and to identify and resolve misunderstandings along the way. We know that there will be change along the way, we just don't know what will change. We need a process that helps people cope with uncertainty as their experience grows—to anticipate unanticipated changes.

Feedback is the Fundamental Tool

We think that the best approach a team can take is to use empirical feedback to learn about the system and its use, and then apply that learning back to the system. This implies that we need repeated cycles of activity; in each cycle we add new features and get feedback about the quantity and quality of the work we've already done. We split the work into time boxes, within which we analyse, design, implement, and deploy as many features as we can. Deploying completed work at each cycle is critical, every time we do so we have an opportunity to check our assumptions against reality: we can measure how much progress we're really making, detect and correct any misunderstandings and errors, and adapt the current plan in response to what we've learnt. Without deployment, there is no feedback.

We apply feedback cycles at every level of development, organising projects as a system of nested feedback loops ranging from seconds to months. Each loop exposes the team's knowledge and work to empirical feedback so that the team can discover misunderstandings and correct them. The more feedback loops a project establishes, the more chance it has to discover and correct errors. The nested feedback loops reinforce each other; if a discrepancy slips through an inner feedback loop, there is a good chance that an outer feedback loop will catch it.

Nested Feedback Loops

Figure 1.1. Nested Feedback Loops


Different feedback loops address different aspects of the system and development process. When we have completed some unit-tested code, we check it into the source code repository so that it can be integrated with the rest of the system. When we have completed a feature, we deploy a new version of the system to production so that we can see how well it serves our users' needs.

The faster we can get feedback about any aspect of the project the better. In Figure 1.1, “Nested Feedback Loops” we show releases happening every few weeks, which is common in large organisations. Some teams release at granularities of days, or even hours, which gives them an order of magnitude increase in opportunities to receive and respond to feedback from real users.

Practices that Support Change

The only reliable way we can grow a system feature by feature is if we can be sure that new features don't break existing ones, which requires frequent testing to catch regression errors. For systems of any interesting size, frequent manual testing is too expensive and slow, not to mention tedious, so we must automate as much as we can to reduce the costs and risks of building, deploying and modifying the system.

As developers, our strategy for being able to grow a system and accept unanticipated change is to keep the code as easy as possible to understand and, hence, modify. The best way we know to achieve that is to write the simplest code we can. As we all know, simplicity takes effort to achieve, so we need continually to refactor [Fowler1999] our code as we work with it to improve and simplify its design, to remove duplication, and to ensure that it clearly expresses what it's supposed to do. Once again, the test suites in the feedback loops help us achieve this by protecting us against our own mistakes when we improve (i.e., change) the code. The catch is that few developers enjoy testing their code. Compared to solving a problem and capturing that solution in elegant code, testing or even writing automated tests is considered repetitive and boring; and most people do not do well at work they find quite so uninspiring.

Test-Driven Development (TDD) turns this situation on its head. We write our tests before we write the code. Instead of just using testing to verify our work after it's done, Test-Driven Development turns testing into a design activity. We use the tests to help clarify our ideas about what we want the code we're about to write to do and how to structure it, and to give us rapid feedback about the quality of the system. If we write tests all the way through the development process, we can build up a safety net of automated regression tests that give us the confidence to make changes.

… you have nothing to lose but your bugs

We cannot emphasise strongly enough how liberating it is to work on Test-Driven code that has thorough test coverage. We find we can concentrate just on the task in hand, secure that we're doing the right work and that it's actually quite hard to break the system—as long as we follow the practices.

Test-Driven Development in a Nutshell

The cycle at the heart of TDD is: write a test; write some code to get it working; clean up the code to be as simple an implementation of the tested features as possible. Repeat until it's not worth building any more features.

The fundamental TDD cycle

Figure 1.2. The fundamental TDD cycle


As we develop the system, we use TDD to give us feedback on the quality of both its implementation (“does it work?”) and design (“is it well structured?”). Developing test-first, we find we benefit twice from the effort. Writing tests:

  • makes us clarify the acceptance criteria for the next piece of work, we have to ask ourselves how we can tell when we're done (design);

  • encourages us to write loosely coupled components, so they can easily be tested in isolation and, at higher levels, combined together (design);

  • adds an executable description of what the code does (design); and,

  • adds to a complete regression suite (implementation);

while running tests:

  • detects errors while the context is fresh in our mind (implementation); and,

  • lets us know when we've done enough, discouraging “gold plating” and unnecessary features (design).

This feedback cycle can be summed up by the Golden Rule of Test-Driven Development:

The Golden Rule of Test-Driven Development

Never write new functionality without a failing test

The Bigger Picture

It is tempting to start the TDD process by writing unit tests for the classes in the application. This is better than having no tests at all and can catch those basic programming errors that we all know but find so hard to avoid: fence-post errors, incorrect boolean expressions, and the like. But a project with only unit tests is missing out on critical benefits of the TDD process. We've seen projects with high quality code that turned out not to be used, or that could not be integrated with the rest of the system and so had to be rewritten.

How do we know where to start writing code? More importantly, how do we know when to stop writing code? The golden rule tells us what we need to do: write a failing test.

When we're implementing a feature, we start by writing an acceptance test, which exercises the functionality we want to build. While it's failing, an acceptance test demonstrates that the system does not yet implement that feature; when it passes, we're done. When working on a feature, we use its acceptance test to guide us as to whether we actually need the code we're about to write, we only write code that's directly relevant. Underneath the acceptance test, we follow the unit level test/implement/refactor cycle to develop the feature; the whole cycle looks like Figure 1.3, “Inner and outer feedback loops in TDD.”.

Inner and outer feedback loops in TDD.

Figure 1.3. Inner and outer feedback loops in TDD.


change "end-to-end" to "acceptance" in the diagram above

The outer test loop is a measure of demonstrable progress, and the growing suite of tests protects us against regression failures when we change the system. Acceptance tests often take a while to make pass, certainly more than one check-in episode, so we usually distinguish between acceptance tests we're working on (which will not fail the build), and acceptance tests that have been approved (which must pass during the build); see “Separate tests that measure progress from those that catch regressions”.

The inner loop supports the developers. The unit tests help us to maintain the quality of the code and should pass soon after they've been written. Failing unit tests should never be committed to the source repository.

Testing End-to-End

Wherever possible, an acceptance test should exercise the system end-to-end without directly calling its internal code. An end-to-end test interacts with the system only from the outside: through its user interface, by sending message as if from third-party systems, by invoking its web services, by parsing reports, and so on. As we discuss in Chapter 6, The Walking Skeleton, the whole behaviour of the system includes its interaction with its external environment. This is often the riskiest and most difficult aspect; we ignore it at our peril. We try to avoid acceptance tests that just exercise the internal objects of the system, unless we really need the speed-up and we already have a stable set of end-to-end tests to provide cover.

For us, “end-to-end” means even more than just interacting with the system from the outside — that might be better called “edge-to-edge” testing. We prefer to have the end-to-end tests exercise both the system and the process by which it's built and deployed. An automated build, usually triggered by someone checking code into the source repository, will: check out the latest version; compile and unit-test the code; integrate and package the system; perform a production-like deployment into a realistic environment; and, finally, exercise the system through its external access points. This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software's lifetime. Many of the steps might be fiddly and error-prone, so the end-to-end build cycle is an ideal candidate for automation. You'll see in Chapter 6, The Walking Skeleton how early in a project we try to get this working.

A system is deployable when the acceptance tests all pass, because they should give us enough confidence that everything works. There's still, however, a final step of deploying to production. In many organisations, especially large ones or those that are heavily regulated, building a deployable system is only the start of a release process. The rest, before the new features are finally available to the end users, might involve different kinds of testing, hand-over to operations and data groups, and coordinating with other teams' releases. There may also be additional, non-technical costs involve with a release, such as training, marketing or an impact on service agreements for downtime. The result is a more difficult release cycle than we would like, so we have to be understand our whole technical and organisational environment.

Levels of Testing

We build a hierarchy of tests that correspond to some of the nested feedback loops shown in Figure 1.1, “Nested Feedback Loops”:

Acceptance Tests

does the whole system work?

Integration Tests

does our code work against code we can't change?

Unit Tests

do our objects work, are they convenient to work with?

There's been a lot of discussion in the TDD world over the terminology for what we're calling Acceptance Tests: Functional Tests, Customer Tests, System Tests. Worse, our definitions are often not the same as those used by professional software testers. The important thing is to be clear about our intentions. We use Acceptance Tests to help us, with the domain experts, understand and agree what we are going to build next. We also use them to make sure that we haven't broken any existing features as we continue developing.

Our preferred implementation of the “role” of Acceptance Testing is to write End-to-End Tests which, as we just noted, should be as end-to-end as possible; our bias often leads us to use these terms interchangeably although, in some cases, Acceptance Tests might not be end-to-end.

We use the term Integration Tests to mean tests that check how some of our code works with code from outside the team that we can't change. It might be a public framework, such as a persistence mapper, or a library from another team within our organisation. The distinction is that these the Integration Tests are there to make sure that whatever abstractions we build over third-party code work as we expect. In a small system, such as the one we develop in Part II, “A Worked Example”, Acceptance Tests might be enough. In most professional development, however, we'll want Integration Tests to help tease out configuration issues with the external packages, and to give quicker feedback than the (inevitably) slower Acceptance Tests.

We won't write much more about techniques for Acceptance and Integration Testing, since both depend on the technologies involved and even the culture of the organisation. You'll see some examples in Part II, “A Worked Example” which we hope give a sense of the motivation for Acceptance Tests and how they fit in the development cycle. Unit Testing techniques, however, are specific to a style of programming, and so are common across all systems that take that approach—in our case, Object-Oriented.

External and Internal Quality

Another way of thinking about what we can learn about a system from its tests is to distinguish between its external and internal quality: external quality is whether the system meets the needs of its customers and users (functional, reliable, available, responsive, etc.), and internal quality is whether it meets the needs of its developers and administrators (easy to understand, easy to change, etc.). Almost everyone can understand the point of external quality; it's usually part of the contract to build. The case for internal quality is equally important but is often harder to make. Internal quality is what lets us cope with continual and unanticipated change which, as we saw at the beginning of this chapter, is an essential fact of working with software. The point of maintaining internal quality is to allow us to modify the system's behaviour safely and predictably, because it minimises the risk that we'd need to tear the code apart.

Running End-to-End Tests tells us about the external quality of our system, writing them tells us something about how well we (the whole team) understand the domain, but neither case tells us how well we've written the code. Writing Unit Tests gives us a lot of feedback about the quality of our code, running them tells us that we haven't broken any classes but, again, neither gives us much confidence that the system as a whole works. Integration Tests fall somewhere in the middle, as in Figure 1.4.

Feedback from Tests

Figure 1.4. Feedback from Tests


We find that Unit Testing helps maintain internal quality because, to be tested, a unit has to be structured to run in a test fixture, outside the system. A Unit Test for an object needs to create the object, provide its dependencies, interact with it, and check that it behaved as expected. So, to be easy to unit test, an object must have explicit dependencies that can easily be substituted, and have clear responsibilities that can easily be invoked and verified. In software engineering terms, that means that the code must be loosely coupled and highly cohesive, in other words: well-designed.

When we've got this wrong—when a class, for example, is tightly coupled to distant parts of the system, has implicit dependencies, or has too many or unclear responsibilities—we find the Unit Tests difficult to write or understand, so writing Unit Tests first gives us valuable, immediate feedback about our design. Like everyone, we find it tempting to stop testing when our code makes it difficult but we try to resist. We use such difficulties as an opportunity to investigate why the test is hard to write and refactor the code to improve its structure. We call this “Listening to the Tests”, and we show some examples in Chapter 19, Listening to the Tests.