Duplication-Driven Development?

What is the primary force that drives you when doing test-driven development? Surprisingly, it might just be code duplication.

In the book “Test-Driven Development by Example“, Kent Beck uses an implementation of money as an example of TDD. At one point, a test case says “assert(someFunc(5) == 5 * 2)” and to satisfy the test case, someFunc is implemented simply by returning a hard-coded 10. “Now that is dirty” you say (and that’s what I said :). According to the book, the reason for hard-coding is that we want the test case to pass as soon as possible. After that, the refactoring phase can take over and we can clean up the code.

The Petronas Twin TowersHere comes the interesting part: hard-coding is a terrible habit, right? And we don’t want our bad habits to show up in the code. This sign of bad taste should be reason enough to fix it, right? However, there is a more fundamental reason for getting rid of the constant: code duplication. The constant 2 * 5 = 10 occurs both in the test case and in the code. To remove the duplication, change someFunc(param) to return “param * 2” (later, the book also gets rid of the 2 by replacing it with a member variable). To sum up, instead of acting on “taste”, we use the well proven design principle of (avoiding) code duplication.

A fascinating insight from the book is that the design of the code, too, can grow naturally from code duplication. For example, let’s say we have a class that encapsulates some algorithm. When another class with similar behavior is required, the book suggests we can accept some code duplication (even extensive copy-paste) to pass the tests. After getting the tests to pass, we refactor. Lets say the difference between the two classes is the fact that they use different algorithms to carry out their work. Apart from that, they are identical. To avoid code duplication, we combine the two classes. Quite naturally, this results in us isolating the algorithms behind an interface. By following the path of least resistance for removing code duplication, we have implemented the strategy pattern. To summarize, this suggests that code duplication helps us drive the high-level design of our code. If true, it is kind of a comforting feeling that refactoring and code duplication naturally leads us to good design.

All and all, “Test-Driven Development by Example” is well-written, easy to read and quite a good book, albeit maybe more suitable for the aspiring test-driven designer than the seasoned on.

Zero Tolerance: Writing Good Software Is Tough

I want to work with clean and readable code, with decent test coverage. Thus, when I run across some small flaw in the code (e.g. code duplication in some test scripts), it should be fixed. But I admit, sometimes I just convince myself “hm, I’m so close to getting this feature up and running, let’s fix this code after that”. This almost always give me more agony later.

So when I had the opportunity to participate in a new project with no dependencies on legacy code, I thought I would try something new, just for fun. Let’s call it zero tolerance: “when finding flawed code, fix it now. Flawed code could be anything: smelly code , confusing script names, non-descriptive function names, bad file structure, incomprehensible test cases, you name it. Let nothing pass!

Trying it out, my first observation was that it was painful! You have this new, fully functional feature within reach and this crappy code that requires a one-day refactoring appears. It is tough on your mental health to just ignore the easy-to-reach reward and do the really boring stuff. Especially when your threshold to what needs to be fixed is so low (actually, it’s zero). Sometimes it will feel like all you do is fiddle with old code.

Stick or carrot

Stick or carrot?

The upside is that all the punishment will make you learn (stick-style rather than carrot). You know that if you don’t concentrate and do things properly the first time, you will suffer later. And like I said, the suffering will inevitably happen when you are just about to write those last lines to get your system ready for a test spin. I would say it motivates you to produce code of higher quality from the beginning.

When I first started doing the zero tolerance thing, I expected it to get easier over time. The funny thing is, the amount of code that needs to be fixed seems to be constant over time! How is that possible? Maybe I am just sloppy and introduce lots of bad code? (Probably, but the punishment makes me improve! :) Maybe it is just impossible to foresee all the problems that may appear? Maybe. But I also think something else is at play.

Large software projects often grind to a halt. The complexity of the software becomes overwhelming and new features are delivered more and more slowly. Why is that? With increasing amounts of code, there will be increased opportunities for sloppy code. It will pile up. Without proper care, the amount of code that needs to be fixed will grow over time (probably much faster than the code size grows, the broken windows theory doesn’t help either). So maybe being constant over time is really a great outcome.

Like I said, I was hoping that the work load from code fixing would actually decrease over time. Maybe software is just hard (in terms of essential complexity), and there’s not much we can do about it. We will have to do our best not to make things worse by introducing too much accidental complexity. I will keep doing the zero tolerance thing for a while. I’ll report back on my findings if something interesting happens.

If you decide to try the zero tolerance approach, be warned: you should probably do it on a new piece of code. If you choose a piece of code that was written with a normal level of acceptance for design flaws and technical debt, you will probably lose your mind before you get any new functionality in there!