Measure Your Code Un-coverage

Maximizing code coverage is not the way to maximize the benefits of unit testing. Instead (1) identify the most important user scenarios and (2) measure and analyze the code “un-coverage”.

Why do we unit test? There’s a multitude of reasons. We want to feel confident our code works and we want fast feedback. Other benefits are related more to the structure of the code. Good tests imply good testability which in turn implies proper decoupling, user-friendly APIs etc. But what do we unit test?

Code coverage is a metric often tied to unit testing. Code coverage can mean many things. The most commonly used metric is “line coverage”, which compares the code lines exercised by all your tests with the total number of lines (normally expressed in ELoCs, effective lines of code = code lines excluding blank lines, comments and curly braces). All coverage metrics work like this, comparing something (e.g. ELoCs) with the total number. Other coverage metrics are “function coverage”, “branch coverage” and “path coverage”.

Having a 60% line coverage means your test execution has touched 60% of your code base. It means close to nothing, however, in terms of quality assurance. You could hit 60% of the code lines and still miss the majority of your most important user scenarios. On the flip side, having a 60% line coverage also means 40% of the code lines have never been executed. Now this is useful information. If a line of code has never been executed, there is a chance the code cannot even be executed without crashing. Or formatting the hard drive, you never know.

So instead of focusing on covering your code with tests, follow this procedure:

  1. Identify the most important user scenarios and alternate paths and implement them as unit tests (or tests on other levels if more appropriate, e.g. component, integration or system level).
  2. Measure your code “un-coverage” and observe which parts of the code are never touched by your test cases. Come up with user scenarios that will exercise the un-covered code. Write tests.

You might still have uncovered code after this procedure. Obtaining 0% code un-coverage (100% code coverage) is expensive. All testing is a trade-off between the risk of releasing something broken and the cost of testing it. When you have covered what is important to your end-user and analyzed the remaining un-covered code, functionality-wise your tests are in great shape even if your coverage is nowhere near 100%.

As a side note: In the case of test-driven development, we will end up with close to 0% un-covered code. But with TDD, bullet (1) is more important than ever. It is easy to get carried away focusing on covering each line of production code with tests and lose focus on what’s important: testing the right thing, i.e. the things that are most important for your end-users. Happy testing.

Zero Tolerance: Writing Good Software Is Tough

I want to work with clean and readable code, with decent test coverage. Thus, when I run across some small flaw in the code (e.g. code duplication in some test scripts), it should be fixed. But I admit, sometimes I just convince myself “hm, I’m so close to getting this feature up and running, let’s fix this code after that”. This almost always give me more agony later.

So when I had the opportunity to participate in a new project with no dependencies on legacy code, I thought I would try something new, just for fun. Let’s call it zero tolerance: “when finding flawed code, fix it now. Flawed code could be anything: smelly code , confusing script names, non-descriptive function names, bad file structure, incomprehensible test cases, you name it. Let nothing pass!

Trying it out, my first observation was that it was painful! You have this new, fully functional feature within reach and this crappy code that requires a one-day refactoring appears. It is tough on your mental health to just ignore the easy-to-reach reward and do the really boring stuff. Especially when your threshold to what needs to be fixed is so low (actually, it’s zero). Sometimes it will feel like all you do is fiddle with old code.

Stick or carrot

Stick or carrot?

The upside is that all the punishment will make you learn (stick-style rather than carrot). You know that if you don’t concentrate and do things properly the first time, you will suffer later. And like I said, the suffering will inevitably happen when you are just about to write those last lines to get your system ready for a test spin. I would say it motivates you to produce code of higher quality from the beginning.

When I first started doing the zero tolerance thing, I expected it to get easier over time. The funny thing is, the amount of code that needs to be fixed seems to be constant over time! How is that possible? Maybe I am just sloppy and introduce lots of bad code? (Probably, but the punishment makes me improve! :) Maybe it is just impossible to foresee all the problems that may appear? Maybe. But I also think something else is at play.

Large software projects often grind to a halt. The complexity of the software becomes overwhelming and new features are delivered more and more slowly. Why is that? With increasing amounts of code, there will be increased opportunities for sloppy code. It will pile up. Without proper care, the amount of code that needs to be fixed will grow over time (probably much faster than the code size grows, the broken windows theory doesn’t help either). So maybe being constant over time is really a great outcome.

Like I said, I was hoping that the work load from code fixing would actually decrease over time. Maybe software is just hard (in terms of essential complexity), and there’s not much we can do about it. We will have to do our best not to make things worse by introducing too much accidental complexity. I will keep doing the zero tolerance thing for a while. I’ll report back on my findings if something interesting happens.

If you decide to try the zero tolerance approach, be warned: you should probably do it on a new piece of code. If you choose a piece of code that was written with a normal level of acceptance for design flaws and technical debt, you will probably lose your mind before you get any new functionality in there!

Lean Software Development

I just finished reading the book “Implementing Lean Software Development – From Concept to Cash” by Mary and Tom Poppendieck. In essence, “lean” means “reduce waste”, and “waste” means “everything that does not make your customers happier”. Many concepts in the book was pioneered by Toyota while applied to manufacturing. Toyota then moved on and introduced lean in product development (including software).

Implementing Lean Software DevelopmentWhen a customer ask you for a feature, how long does it take until they get it (calendar time)? While implementing it, how much effective time have you spent on the feature? The difference between calendar time and implementation time is waste (since waiting makes your customer unhappy). This waste can be analyzed using Value Stream Mapping (example). Typical things that takes time for no good reason is waiting for approval (solution: approve within 24h instead of in the bi-weekly meeting) and long fixed releases. I have come across really long release cycles in previous work places and when talking to friends at other companies. (Half a year, a year is not unusual, which should be contrasted with the book’s claim that Toyota developed and released the Prius within a year and a half!). The solution is to release and/or deploy more often. In order to do this, we need a relentless focus on quality (= no bugs).

According to the book, having bugs in a bug tracking system is waste. Software with bugs in it is not finished work, and piling up unfinished work is waste (since it cannot be released to your customer until it’s finished). Also, someone has taken the time to enter the information, and someone must look at it to resolve the bug. Meanwhile, someone else might stumble across the same bug. All this takes time. Lean software development says you should solve the bugs right now. In order to make changes to solve the bug right away, you need tests to make sure you don’t break something else. Good quality software starts with rigorous and automated tests on acceptance and unit level (see my previous posts: test-driven development done right and edge-to-edge unit tests). Reducing the extra work from bugs will free up resources to implement new features.

Lean software development is about continuous improvements. Every day, we should ask questions like: How do I make my customer happier? How do I reduce waste in order to release faster? But who should answer these questions and implement the improvements? Lean says it’s the people closest to the problems. Inevitably, in software development, it will be the programmers! I think this is the most appealing part of the contents of the book.

This is a fantastic book and I am sure I will read it again pretty soon. Read it!

Edge-To-Edge Unit Tests

The term unit test implies at least two different things. First, it means testing your code at the smallest unit, which is a function or a class (well, technically, you test the methods of a class, but they would make little sense in isolation). Second, it means writing test code in a language to test functionality in the same language.

Normally, when I write C++ code to test my C++ functionality, I tend to stay away from the “unit” level. Instead, I like tests that exercise the system edge-to-edge, resembling the interactions with the outside world as much as possible. Now, full edge-to-edge testing is normally not possible since the peripheral parts of a system are often hard to control. For example, let’s say I have a system with network on one side and a GUI on the other side. A realistic test case would have traffic over the network and a GUI reflecting that. But taking the network as an example, it complicates testing due to issues like slow response times, need for a remote side and network failures. So I settle for testing up to the interfaces of the network and GUI: you would inject network “traffic”(or time-outs) on the network interface, verify that the GUI interface is told to show something, do some user input on the GUI interface and watch outgoing network traffic being generated.

As with everything, there are pros and cons when testing at this level. To me, the main benefits compared to low-level unit tests are:

  • Having tests at that level gives me confidence that there’s a reasonably low probability of faulty interaction with the outside world.
  • The boundaries of your system are much less likely to change than the internals. This means you are less likely to spend time changing your tests.
  • It is easy to argue for the business value of the tests. They correspond well to what the customer expects and are used to guarantee the quality.
  • The tests are a decent measure of progress. Having a passing test means you are close to something to show to your customer.
  • It’s fun! And you can test-run your system before you have a network and a GUI.

The downsides I’ve experienced compared to testing at the unit level are:

  • Tests like these can make it harder to achieve decent code coverage. For example, your code might involve randomness, timing issues or use of the current time. You will have to make sure these can be controlled from the test context.
  • High-level testing can be hard to introduce late in the development process. For this to succeed, the whole system must be designed for testability. See the previous item.
  • The tests become monolithic. I’ve come across the situation where parts of my system were broken out to form a new shared component. The new component has to have tests of its own (or someone changing it won’t notice it’s broken until they run your tests). Your tests use the classes of your system, which are not suitable to use in a shared component since dependencies would go the wrong way.
  • It might be overkill for testing some parts of the system. For example, if you have some deep-down string manipulation code, you should go ahead and unit test it (in the true sense of the word). It’s all about choosing the proper tools for the problem.
  • Due to complexity in the lower levels of your software, you might be facing a combinatorial explosion of different test cases. You will have to select a few representative test cases and resort to normal unit testing of test the low-level parts. See the previous item.
  • Testing on this level poses sort of a communication problem. If I call my tests “unit tests”, most people think only of tests on the lowest level. If I call them “acceptance tests” or “functional tests”, someone will inevitably assume I have properly tested the system from the outside, edge-to-edge (which is definitively necessary, even with the tests described above). Calling them e.g. “functional unit tests” only adds to the confusion. (“What do you mean? Is it a unit test? Is it a functional test? Surely, it can’t be both.”) If you know of terminology that could help, let me know. Until I hear from you, I will just call them Edge-to-Edge Unit Tests.

As I said before, it’s about choosing the proper tool for your problem. If at all possible, I resort to “unit testing” at the highest possible level. If you haven’t done so, you should give it a try.