There is quite a bit of disagreement on how Test Driven Development affects development speed and code quality. As with any programming methodology, the success of TDD depends on many variables, but research suggests that the most critical factor in the success of TDD projects is the definition of done. Case studies of projects developed at IBM and Microsoft bring a bit of empirical evidence to a debate that has been dominated mainly by anecdotes and opinion.

Just a quick refresher, TDD looks like this:

Create a test
Run all tests, making sure the new test fails
Write the minimum amount of code to make the test pass
Run all of the tests to ensure they pass
Refactor the code, making sure that the tests still pass
Repeat for each new code module being developed

TDD has strong supporters and detractors. At one extreme, agile consultants like Robert Martin (Uncle Bob) fiercely advocate a TDD process in all cases, asserting that unit testing saves time during development. On the other side, grey beards like Knuth talk about using a large up front design process with absolutely no testing until the end of a project. These are the outliers though – not many of us make our living as an Agile consultant and author, nor are we writing the definitive text on Computer Science.

Peter Seibel used excepts of his book, Coders at Work, to put together a nice synopsis of both sides of the TDD debate. The central criticism of TDD is that it slows developers down, while TDD proponents claim that unit testing does not slow a project down, and that in cases TDD actually speeds things up. So who is right here?

Research done by IBM and Microsoft indicates that TDD teams took 15-35% longer than teams using more traditional development practices. However, the bug density of projects developed using TDD decreased 40-90% relative to similar projects that did not practice TDD. There is nothing here to support the claim that Test Driven Development is faster, but the study leaves an important question unanswered.

If we control for defect rate, is TDD faster? Could the non-TDD teams have produced code with a similar defect rate if they had used as much development time as the TDD teams? I would be willing to bet that they could. So why didn’t the non-TDD teams take more time in order to reduce their defect rate? Why did they declare the project done and ship with so many outstanding bugs?

The IBM-Microsoft research used case studies of real projects developed in a professional setting. Accordingly, these projects experienced the same schedule pressures and race-to-the-finish bug fixing sessions that professional programmers are all too familiar with. TDD developers iterate rapidly between test and code. Since testing is such an integral part of the development methodology, they are unlikely to skimp on test coverage even when deadlines loom. However non-TDD teams are more likely to back load testing, which allows them to met unrealistic deadlines by sacrificing testing time. Anecdotal evidence certainly supports this scenario.

There’s bound to be stuff where this would have gone faster if we’d had unit tests or smaller modules or whatever. That all sounds great in principle. Given a leisurely development pace, that’s certainly the way to go. But when you’re looking at, “We’ve got to go from zero to done in six weeks,” well, I can’t do that unless I cut something out. And what I’m going to cut out is the stuff that’s not absolutely critical. And unit tests are not critical.-Jamie Zawinski in Coders at Work

Shunting the testing phase off until the end of a project makes it particularly vulnerable to schedule pressure, because the code is in some sense done when all of the features have been implemented. Even though the code is buggy, there is in fact a shippable product. It may be very tempting to just ship it, especially if the project has already experienced scheduling delays. In contrast, the same application developed with TDD won’t be nearly as vulnerable to schedule pressure because the test-up-front methodology delays the point where there is a shippable product.

So perhaps the most valuable effect of TDD is just a side effect of upfront unit testing: it relieves schedule pressure and allows teams to delay the point at which code can be called done.

Discussion on
Test Driven Development and the Meaning of Done

by Jess Johnson in Tips & Tutorials

11 Comments

Bryan
9:21 am UTC, 2010-12-02
Excellent points. I'd be curious to know if the TDD team was still slower when it comes to version 2 and version 3 of the software. I did notice a couple quick issues you might want to know about: you double the word "strong" just after the TDD steps, and I think you meant relieve rather than relive in the last sentence. Thanks for posting a great entry!
anon
2:50 pm UTC, 2010-12-02
This is comparing apple and oranges. The two development teams in the test should keep working until they have reached (roughly) the same amount of bugs or hours invested in the project and then compare the remaining free variable.
Jon
3:40 pm UTC, 2010-12-02
Why did they declare the project done and ship with so many outstanding bugs? Why, indeed! To my mind, "done" means "feature complete and free of defects." Hell, I could build a spaceship in a week if it doesn't have to actually work. "Survive the stress of lift-off or the heat of atmospheric re-entry? Well, we didn't exactly test for that..." And if the project is truly calendar-bound, then I would challenge the client for a smaller featurelist -- build fewer reliable features. It may not do as much, but what is there works. Well. But then, I use an Agile approach which strives for maximum transparency. The client knows exactly what is possible based on the team's velocity and we simply reflect reality, not wishful thinking. We don't "hope" for good results and settle for whatever happens. IMO that's just unprofessional. I, too, would be interested in the point Bryan raised -- what happens when it's time to build v2 or v3? IMO, bugs are like an infection -- the longer that bugs are allowed to fester, the more deeper their impact and the greater the difficulty when it comes time to solving them due to side effects. Similar to poor design decisions. A shortcut which "saves time" up front can cost many hours/days later due to shortcomings. As an aside, I simply LOVE the "Programmers/Coders at work" series for the insights into the minds and working styles.
Jess Johnson
3:30 am UTC, 2011-02-03
I absolutely agree with your definition of done, and wouldn't want to call any project of mine "done" unless it was feature complete and free of defects. However, I can imagine a few scenarios where it might be better to ship a buggy feature than not to ship it at all. A startup that wants a first mover advantage might fit into this category. Or to take your analogy a bit further, what if your spaceship was the only hope to destroy a comet on course to demolish the earth in a week?
Enzo
10:29 pm UTC, 2011-05-31
It is the 80/20 rule. 20% of the code base will cause 80% of the bugs. Not all bugs even matter. Spending so much time testing can be even more wasteful than not. If the time taken to write and maintain unit tests is greater than the cost of just fixing the bug, then you are wasting time and resources. Now you have to bug fix your regular code and you have to bug fix your test code too. If you can't write regular code properly, what makes you think you will write test code properly? Don't you then need a unit test for the test code?
Jeff L.
4:57 pm UTC, 2012-08-28
An odd, somewhat cynical conclusion. Doesn't introducing a significant defect (i.e. one that must be fixed) that is found by QA delay the ability to ship product? The cost of managing each defect also incurs incremental delays that do add up. More studies would be useful here, but "faster to develop" isn't the only relevant factor. What is the cost of a defect? It's not simply the time to fix it. Part of the cost gets shifted to the customer in the form of wasted time or inability to effectively complete work, part of it to your company in the form of support calls. When a defect is significant enough, or when you have gobs of them, you lose customers (seen it). You also lose opportunity time while you fix it, and the cost to fix will often increase as the time between introduction and discovery increases. Other studies might examine TDD (practiced with high levels of effort to incrementally keep the code clean) and corresponding codebases over time. Most codebases are not factored well and contain 50-100% more code than they need (this is anecdotal but based off seeing hundreds of customer systems). What is the long-term cost of a code base that is difficult to understand and maintain? How many of us have spent an afternoon deciphering and struggling with adding a small feature that should have taken only 15 minutes? If I did TDD only to minimize defects, I'd probably worry about the research and numbers more. But I'd also insist on a proper accounting for the true cost of defects.
Jeff L.
5:08 pm UTC, 2012-08-28
Hi Jess, I'm ok with the lean startup notion of shipping code that's likely to be crappy. That's a calculated risk. The agile folks make the case for a different tradeoff--shipping a smaller feature set of high quality, then incrementing the product. Both can work; depends on the context. I'd done enough of both TDD and not TDD. It usually only takes me a couple days before I start recognizing time I'm wasting because I'm *not* doing TDD. If my spaceship was the only hope, TDD would start paying off a day or two in. I'd rather have a tool that prevented me from incorporating a major defect that would cause it to crash and burn before I smugly sent it up at the last minute. Contrived examples aside, I've also seen shops that hacked together quite solutions and shipped them to some success. Years later, every developer cursed every minute of working on the software, now several hundred thousand lines built atop a core hack-base of maybe 50,000 lines of really shitty code. Lack of quality costs far more in the long run.
J. B. Rainsberger
11:17 am UTC, 2012-08-29
I strongly doubt, with admittedly only mostly anecdotal evidence and personal experience as justification, that giving the non-TDD teams more time will help much less than you'd expect, primarily because the people on those teams spend the extra time thinking about algorithms and data structures, rather than thinking about interfaces and behavior. In other words, simply giving the non-TDD teams more time will not encourage them to think about the kinds of things that lead to asking "Have we done everything we need to do yet?" TDD provides but one mechanism to encourage people to ask this question -- indeed, it encourages people to make a habit out of asking this question.
J. B. Rainsberger
11:19 am UTC, 2012-08-29
When I write a test, I have to clarify what I'm trying to do. This act alone reduces the number of mistakes I make. True, it doesn't focus that reduction in "the parts that matter", but I don't usually know before building the system which parts will matter. I can guess, but I don't think I can guess well enough to trust those guesses.
shmoo
3:24 pm UTC, 2012-11-21
I've seen many TDDers write terrible code, all of them falling for the myth that testability=good design. This is wrong. Testability is a tiny part. It's been overwhelmingly my experience that TDDed code has to be re-written by others within only a couple of sprints became its so bad.
Eric
3:57 pm UTC, 2015-10-16
Didn't see this anywhere in the comments so I thought I'd point it out. TDD isn't the only effective defect reduction mechanism in town. In fact, it's not even considered to be the most effective. Individual developer and team design and code reviews are usually far more effective at identifying defects than TDD alone. Especially for non-trivial systems. Should those non-TDD teams have been doing design and code reviews rather than TDD, they probably would have released higher-quality software, and they would have done so more quickly. Only practicing high-yield design and code reviews made using regularly updated checklists of common defects will convince you of this though. I've seen it work wonders in my own code and in my organization though.

About

Test Driven Development and the Meaning of Done

Discussion on
Test Driven Development and the Meaning of Done

11 Comments

Bryan

anon

Jon

Jess Johnson

Enzo

Jeff L.

Jeff L.

J. B. Rainsberger

J. B. Rainsberger

shmoo

Eric

Leave a reply