Everyone that wants to get serious with unit testing should follow the lead of Edgeware and dedicate half a day or so of their developers’ and testers’ time to identify strengths and weaknesses within the development organization. I was invited to present best practices of unit testing. We also touched upon continuous integration and future directions in terms of development excellence. It was fun and interesting and we had good discussions. :) Perhaps the slides (pdf) can give you some inspiration to improve your unit testing and development process.
Maximizing code coverage is not the way to maximize the benefits of unit testing. Instead (1) identify the most important user scenarios and (2) measure and analyze the code “un-coverage”.
Why do we unit test? There’s a multitude of reasons. We want to feel confident our code works and we want fast feedback. Other benefits are related more to the structure of the code. Good tests imply good testability which in turn implies proper decoupling, user-friendly APIs etc. But what do we unit test?
Code coverage is a metric often tied to unit testing. Code coverage can mean many things. The most commonly used metric is “line coverage”, which compares the code lines exercised by all your tests with the total number of lines (normally expressed in ELoCs, effective lines of code = code lines excluding blank lines, comments and curly braces). All coverage metrics work like this, comparing something (e.g. ELoCs) with the total number. Other coverage metrics are “function coverage”, “branch coverage” and “path coverage”.
Having a 60% line coverage means your test execution has touched 60% of your code base. It means close to nothing, however, in terms of quality assurance. You could hit 60% of the code lines and still miss the majority of your most important user scenarios. On the flip side, having a 60% line coverage also means 40% of the code lines have never been executed. Now this is useful information. If a line of code has never been executed, there is a chance the code cannot even be executed without crashing. Or formatting the hard drive, you never know.
So instead of focusing on covering your code with tests, follow this procedure:
- Identify the most important user scenarios and alternate paths and implement them as unit tests (or tests on other levels if more appropriate, e.g. component, integration or system level).
- Measure your code “un-coverage” and observe which parts of the code are never touched by your test cases. Come up with user scenarios that will exercise the un-covered code. Write tests.
You might still have uncovered code after this procedure. Obtaining 0% code un-coverage (100% code coverage) is expensive. All testing is a trade-off between the risk of releasing something broken and the cost of testing it. When you have covered what is important to your end-user and analyzed the remaining un-covered code, functionality-wise your tests are in great shape even if your coverage is nowhere near 100%.
As a side note: In the case of test-driven development, we will end up with close to 0% un-covered code. But with TDD, bullet (1) is more important than ever. It is easy to get carried away focusing on covering each line of production code with tests and lose focus on what’s important: testing the right thing, i.e. the things that are most important for your end-users. Happy testing.
The term unit test implies at least two different things. First, it means testing your code at the smallest unit, which is a function or a class (well, technically, you test the methods of a class, but they would make little sense in isolation). Second, it means writing test code in a language to test functionality in the same language.
Normally, when I write C++ code to test my C++ functionality, I tend to stay away from the “unit” level. Instead, I like tests that exercise the system edge-to-edge, resembling the interactions with the outside world as much as possible. Now, full edge-to-edge testing is normally not possible since the peripheral parts of a system are often hard to control. For example, let’s say I have a system with network on one side and a GUI on the other side. A realistic test case would have traffic over the network and a GUI reflecting that. But taking the network as an example, it complicates testing due to issues like slow response times, need for a remote side and network failures. So I settle for testing up to the interfaces of the network and GUI: you would inject network “traffic”(or time-outs) on the network interface, verify that the GUI interface is told to show something, do some user input on the GUI interface and watch outgoing network traffic being generated.
As with everything, there are pros and cons when testing at this level. To me, the main benefits compared to low-level unit tests are:
- Having tests at that level gives me confidence that there’s a reasonably low probability of faulty interaction with the outside world.
- The boundaries of your system are much less likely to change than the internals. This means you are less likely to spend time changing your tests.
- It is easy to argue for the business value of the tests. They correspond well to what the customer expects and are used to guarantee the quality.
- The tests are a decent measure of progress. Having a passing test means you are close to something to show to your customer.
- It’s fun! And you can test-run your system before you have a network and a GUI.
The downsides I’ve experienced compared to testing at the unit level are:
- Tests like these can make it harder to achieve decent code coverage. For example, your code might involve randomness, timing issues or use of the current time. You will have to make sure these can be controlled from the test context.
- High-level testing can be hard to introduce late in the development process. For this to succeed, the whole system must be designed for testability. See the previous item.
- The tests become monolithic. I’ve come across the situation where parts of my system were broken out to form a new shared component. The new component has to have tests of its own (or someone changing it won’t notice it’s broken until they run your tests). Your tests use the classes of your system, which are not suitable to use in a shared component since dependencies would go the wrong way.
- It might be overkill for testing some parts of the system. For example, if you have some deep-down string manipulation code, you should go ahead and unit test it (in the true sense of the word). It’s all about choosing the proper tools for the problem.
- Due to complexity in the lower levels of your software, you might be facing a combinatorial explosion of different test cases. You will have to select a few representative test cases and resort to normal unit testing of test the low-level parts. See the previous item.
- Testing on this level poses sort of a communication problem. If I call my tests “unit tests”, most people think only of tests on the lowest level. If I call them “acceptance tests” or “functional tests”, someone will inevitably assume I have properly tested the system from the outside, edge-to-edge (which is definitively necessary, even with the tests described above). Calling them e.g. “functional unit tests” only adds to the confusion. (“What do you mean? Is it a unit test? Is it a functional test? Surely, it can’t be both.”) If you know of terminology that could help, let me know. Until I hear from you, I will just call them Edge-to-Edge Unit Tests.
As I said before, it’s about choosing the proper tool for your problem. If at all possible, I resort to “unit testing” at the highest possible level. If you haven’t done so, you should give it a try.