Unit Test Presentation at Edgeware

Everyone that wants to get serious with unit testing should follow the lead of Edgeware and dedicate half a day or so of their developers’ and testers’ time to identify strengths and weaknesses within the development organization. I was invited to present best practices of unit testing. We also touched upon continuous integration and future directions in terms of development excellence. It was fun and interesting and we had good discussions. :) Perhaps the slides (pdf) can give you some inspiration to improve your unit testing and development process.

Was This Test Written to Specify or Fixate Behavior?

When you read unit test code, have you ever wondered why some tests tell a story (easy to read) while others just feel mechanical (more computer readable than human).

While I have no definitive answers on the human/non-human readable unit tests subject, I have an insight that came to me and a colleague some time ago: Some tests are written as a specification, and some tests are written to fixate. Let me explain what I mean.

When you write a test as a specification, the test code is normally written upfront or together with the code. The purpose of the test code is to document the behavior of the code. Thus, the developer makes an effort to make the test code human-readable. The tests are often used as an acceptance criteria, to verify a piece of functionality was implemented correctly. The fact that you get an executable specification is a great bonus, since the test code would be valuable just as an example.

Other tests are written to fixate the existing behavior of the code. Typical examples are tests written long after the production code in order to cover legacy code with tests. Legacy code can be hard to test, so the test code is difficult to write in human-readable way and tends to be more “mechanical” in nature. The focus of these tests is execution, since the purpose is to make sure refactoring does not break existing behavior.

Naturally, we want our test code to lie in the specification category. But if we have a piece of code that is really hard to test, we can’t just start refactoring blindly. So we write the best tests we can, and refactor the code and tests in parallel. After some work, the production code is in better shape and we have moved our test code into the specification category.

When you write test and production code from scratch, you have every opportunity to write good tests. Make sure you make your code human-readable. The computer will do just fine. :)

Measure Your Code Un-coverage

Maximizing code coverage is not the way to maximize the benefits of unit testing. Instead (1) identify the most important user scenarios and (2) measure and analyze the code “un-coverage”.

Why do we unit test? There’s a multitude of reasons. We want to feel confident our code works and we want fast feedback. Other benefits are related more to the structure of the code. Good tests imply good testability which in turn implies proper decoupling, user-friendly APIs etc. But what do we unit test?

Code coverage is a metric often tied to unit testing. Code coverage can mean many things. The most commonly used metric is “line coverage”, which compares the code lines exercised by all your tests with the total number of lines (normally expressed in ELoCs, effective lines of code = code lines excluding blank lines, comments and curly braces). All coverage metrics work like this, comparing something (e.g. ELoCs) with the total number. Other coverage metrics are “function coverage”, “branch coverage” and “path coverage”.

Having a 60% line coverage means your test execution has touched 60% of your code base. It means close to nothing, however, in terms of quality assurance. You could hit 60% of the code lines and still miss the majority of your most important user scenarios. On the flip side, having a 60% line coverage also means 40% of the code lines have never been executed. Now this is useful information. If a line of code has never been executed, there is a chance the code cannot even be executed without crashing. Or formatting the hard drive, you never know.

So instead of focusing on covering your code with tests, follow this procedure:

  1. Identify the most important user scenarios and alternate paths and implement them as unit tests (or tests on other levels if more appropriate, e.g. component, integration or system level).
  2. Measure your code “un-coverage” and observe which parts of the code are never touched by your test cases. Come up with user scenarios that will exercise the un-covered code. Write tests.

You might still have uncovered code after this procedure. Obtaining 0% code un-coverage (100% code coverage) is expensive. All testing is a trade-off between the risk of releasing something broken and the cost of testing it. When you have covered what is important to your end-user and analyzed the remaining un-covered code, functionality-wise your tests are in great shape even if your coverage is nowhere near 100%.

As a side note: In the case of test-driven development, we will end up with close to 0% un-covered code. But with TDD, bullet (1) is more important than ever. It is easy to get carried away focusing on covering each line of production code with tests and lose focus on what’s important: testing the right thing, i.e. the things that are most important for your end-users. Happy testing.

How Do We Know Our Tests Work Tomorrow?

When writing a test case, good practices suggest that we verify the test can fail. But how can we know the test code doesn’t break later?

How do we know our test code work? It helps to apply a test-first mentality to the largest extent possible. This means writing a failing unit test followed by writing production code to pass the test. As a bonus, “failing first” verifies that the test does what it is meant to do: that it verifies the behavior of your code. We make sure the test actually can fail when the code is broken. As with any code, test code is expected to have bugs in it, so the correctness of your test case needs to be verified. All good.

Robert Martin compares unit tests to dual-entry book keeping, which is a nice analogy: “Accountants who don’t hold to the GAAP [Generally Accepted Accounting Principles] tend to wind up in another profession, or behind bars. Dual Entry Bookkeeping is the simple practice of entering every transaction twice; once on the debit side, and once on the credit side.” The tests (debit) make sure the production code (credit) is correct, while the production code makes sure the test is correct. At least this holds when first writing the test code using test-first methdology above. But does it apply later on?

I have said the following on several occasions: “How can this not work? We have a test case covering exactly this!” Digging deep, it turns out that the test code has broken at some point, and I dare not look in the code repository how long ago. Maybe the code was refactored, and an assert was somehow invalidated or removed. (Examples in funny languages: I had a bash script testing an app. I accidentally removed an “exit 1” statement causing it to never fail. Found out by accident when the app was malfunctioning! At another occation, I wrote this in JavaScript: “test.notEquals(200, response.statusCode)”. I later refactored it to be “status_code”. The problem was that in the production code, I had renamed it to “status” (doh!), giving “response.status_code” the value “undefined” which causes the assert to never fail. Found out by accident when the app was malfunctioning! I also have numerous examples in Java…) We have test code to make sure our production code works. The test code is meta code, looking at the real code. But what mechanisms are there to guarantee that the test code can actually fail still? What about the meta-meta level?

Introducing the “test case test case” (meta level 2): I was thinking about using dependency injection to feed my test case (meta level 1) with a mock application (faking meta level 0). Perhaps we could feed the mock application with erroneous behavior in order to trigger the asserts in the test case (meta level 1). Fantastic! But this makes me concerned about the correctness of the “test case test case”. Hmm, better go into meta level 3. Ok ok, sort of kidding of course.

So that was half a joke, but I’m serious about the problem. I see this over and over and it bothers me that we have tests, but we cannot be sure they do what they should. Over time, I would not be surprised if more and more test cases break in most software projects. Sure, I sometime go into my production code and “return zero instead of 200”, just to make sure something breaks in the test suite, but it doesn’t really scale… I want regression tests for my test cases! :) Any ideas?

Characteristics of a Software Professional

At work, I have been challenged with the question “What are the most important characteristics of a software developer?“. This is a tough question, and no matter what you decide to include, you have to leave something out.

I’ve been part of software development projects in various companies. The successful projects teach you invaluable lessons. The dysfunctional projects even more so. Inevitably, a list of characteristics would include qualities I value and desire in my fellow colleagues, as well as characteristics I’d expect them to want in me. (So this is also a long todo list for me. ;)

To get some kind of structure, I decided on three main categories: Professionalism, Long-term code and Quality mindset.

Professionalism

Take responsibility

You are a professional developer, and professionals act responsibly. If things go wrong,
take responsibility. Understand why things went bad. Learn, adapt and make sure it
never happens again. When faced with a difficult choice, “do the right thing”. Optimize for the long run even if it results in more work today. Be a team player, even if this means saying no. Speak the truth.

Know your product

We’re part of a business. Without successful products, there will be no business.
Know your users and their needs. Use your product, use competitor’s products,
visit customers and watch them use your product (from purchase/download to
installation to day-to-day usage to upgrade to uninstall and so forth).

Continuous learning

Be humble and practice relentless inquiry. Question everything (since everything has potential for improvement), embrace questions on your code, realize others’ suggestions might improve your work, be happy other developers change “your” code
(it’s good enough to understand!). Improve yourself and others. Become a better developer by reading books, papers, blogs etc. Watch instruction videos, listen to podcasts, try new technologies, participate in open source development, discussion groups etc. Discuss your code and what you read with your peers. Talk to developers of other specialities. Teach others what you know well.

Long-term code

Communication

You write a line of code once, but it is read hundreds of times. Invest time to write code easy to read for others. Write code at the right level of abstraction, abstract enough for expressiveness but without hiding necessary detail. Adhere to design principles as
they capture proven ways to high-quality code. Apply design patterns to better communicate your intent.

Maintainability

Decouple the different parts of your software, on every level – sub-system, module, class and function. Write extendable code, so that you can add functionality with minimal change to existing code. Avoid technical debt, and repay debt as soon as possible. Interest has to be payed on all debt.

Proven functionality

Never deliver code unless you’ve proven it works. If you don’t test it, it will be faulty.
Write testable code. Make it testable from the start, later it will be too expensive. Automate your tests to run before check-in, after check-in and nightly. If the tests are not executed automatically, they will not be updated and soon be obsolete. Without automated tests, no-one will dare change any code. Write fast and reliable unit tests. Write realistic integration tests. For each new feature, write acceptance tests. Automate everything.

Quality mindset

Quality is your responsibility

You, as a developer, is responsible for the quality of the product. Other roles can help you spot problems or give you more time, but they cannot affect quality. Never ship anything without being certain of its correctness.

Find bugs early

Find bugs early in the development process. If a bug can be found by the developer, it should be. If you need tools, get them. If you need hardware, get it. If a bug is found late, understand why it was not found earlier. Fix the process so that bugs of this kind never slips through. Automate.

Fix it now

If you find a bug, fix it now instead of filing a bug report. Ask your colleagues to help out. You will save time. File bug reports on things that couldn’t be solved in half a day.
Do things properly the first time. If you don’t have time to do it right today, when are you ever going to find time? Give time estimates that allow you to produce quality products. Think about what is stopping you from being more productive. Fix it, and then move on to the next thing stopping you.

If you’re interested in these things, you should have a look at Poppendiecks’ “Lean software development” books, Senge’s “The fifth discipline“, Martin’s “Clean coder“, McConnell’s “Code complete“, McConnell’s video “World class software companies” and others. I also provide some resources in previous blog posts (e.g. videos and books).

What characteristics do you value in a software developer?

Behavior Driven Development

Behavior Driven Development (BDD) is a flavor of Test Driven Development (TDD). In BDD, we have a specification focus instead of a test focus. What does that mean? And how can it help us write better software?

I just watched Dave Astels’ talk on “Beyond Test Driven Development: Behaviour Driven Development” (video). The first time I ran across Behavior Driven Development (BDD), it struck a chord. It is branch of Test Driven Development (TDD) that suites my style. And as always, hearing someone else’s viewpoint makes you think about your own practices. I definitively learned a lot from the video which describes the rSpec BDD framework (for Ruby). Let’s review some highlights of BDD.

As mentioned, BDD is related to TDD. In the video, Astels stresses the fact that we should try to get away from the testing mindset, and focus on writing specifications of the system behavior instead. In BDD, we have a specification focus instead of a test focus. The name of a BDD test case should read as if it specifies a small piece of system behavior. For example, let’s say you’re writing software for a web shop (as in the wikipedia article on BDD). One test case could be named “refunded items should be returned to stock” (or refundedItemsShouldBeReturnedToStock). The test code is there to clarify the details of the specification.

From a philosophical standpoint, the word “test” makes me think of testing something existing, while “specification” makes me think of specifying the behavior of something not yet built. It might help your thought process when decoupling the test cases (= the specification) from the underlying implementation. This ties into one of the drawbacks we often see with traditional unit testing. A recurring pattern is e.g. a ShippingOrder class with a corresponding ShippingOrderTests test class. This tightly couples the test code with the production code. Refactoring becomes painful as test classes might become obsolete if you decide to split a production code class in two.

Instead, Dave Astels mentions that in BDD, you should organize your test classes not around production code classes, but test fixtures. Each test class has a test fixture, and that fixture captures a specific system setup. (In short, a fixture creates objects from your production code classes in a setUp() function and cleans everything up in a tearDown() function. The setUp() function is executed before each test case in the test class and tearDown() is executed after each test case.) Thus, all tests in a test class would run against the same system setup. For example, if several test cases revolve around returning or refunding items in your shop, create a RefundItemTests test suite.

Since system behavior is not restricted to a single method or class, the notion of “unit” in “unit testing” needs to be widened. Any reasonable subset of the code could participate in the test fixture (even the full system, as described in my post on edge-to-edge unit testing). I like that, since it allows you to find and test on the stable boundaries within your system, thus making the tests less brittle. This should be contrasted with having a one-to-one correspondence between production and test classes, which will be very brittle as the code evolves.

If you have good or bad experiences with BDD, or just random thoughts on unit testing, please drop a line below.

Sustainable Software Development, Part 1: Managing Technical Debt

Software projects sometimes go bad. The pace of development is not sustainable. To achieve sustainable software development, we need to keep our focus on what’s important: the long-term health and maintainability of our source code.

Robert Martin has written a nice article on Scrum projects, and how applying Scrum often start out with hyper-productivity. However, most projects slow down when code size increases. After a while, we might even experience that progress approaches zero as time approaches infinity (if the project is not discontinued before it :).

Why is that? Martin lists a number of activities that are well-known to the experienced developer, such as unit testing, continuous integration and code coverage measurements. Most developers would even agree that applying them will help retaining productivity over the long run. Still, the law of least resistance often takes us down a different path. A strong focus on feature growth is very seductive, as it makes everybody happy in the short term: the customer is happy about their new features, the product owner is happy when he can satisfy the market and the team is happy when everybody commends them for their work. At least for a short while. The lack of focus on quality (such as a lack of automated unit tests and a poor continuous integration setup) will catch up with you.

Technical debt are the things you know you must do but haven’t done yet: fixing a bug, adding more unit tests, refactoring, cleaning up etc. Just as normal debt, it has to be payed back. No surprise there. Unfortunately, just like a normal debt, technical debt also come with an interest rate. You pay interest every time you say “aargh, this would have taken me ten minutes had I refactored this module” or “aargh, I just spent half a day tracking down a problem caused by a known bug”. Discussing, planning or thinking about what must be fixed is paying interest. Soon, your house belongs to the bank. Software development like this is not sustainable – it will come to a halt.

To avoid this, and to achieve sustainable software development, we need to keep our focus on what’s important: the long-term health and maintainability of our source code. Just like the debt analogy, there’s a savings analogy. Invest some money in a savings account (or the stock market), and you will receive interest instead of paying. As a start, it is necessary to make an investment in good practices: fix things now, and use tools and techniques to ensure quality. You will be able to deliver new features for a long time and your velocity will stay high, or even increase over time.

Object-Oriented Programming Lecture at KTH: Slides etc.

Thank you all that participated in my lecture at KTH October 26, 2011. I had a lot of fun, and we had some good discussions. For you who were not there, it was about object-oriented programming and how to write good code. We used some example code to discuss OO principles and testing. Here’s the presentation (pdf)!

A couple of posts of relevance:

JUnit Max Takes Test-Driven Development to the Next Level

Automatic compilation as you type is useful, but can we take it further? JUnit Max automates the execution of your unit tests.

In the good old days, you would build your software by running make from the command line. The output would tell you where the errors were, and it would be up to you to find the file and line to fix the error. Nowadays, most IDEs are somewhat better. Normally, you press a key to build. The IDE then helps you navigate the errors by taking you to the file and line. This still involves a context switch: you will have to stop writing code in order to press a key, and subsequently to look at the error messages. Doing this 50 times per day, and the disruption of the context switches becomes noticeable.

automaticModern IDEs offer another improvement. It will run the compiler automatically in the background for you. For example, Eclipse will compile the code while you’re typing and show the errors underlined in red or as a red icon on files that fails to compile. This removes much of the context switch mentioned above. This, combined with the immediate feedback, leads to less interruptions. This is an improvement.

One role of your compiler is actually that of a test tool. In a sense, it tests your code for a set of very specific problems, such as type mismatches, undefined functions etc. It might even warn you about some null pointer issues that would normally surface during execution. The compiler knows a lot about you code, and this is is also why it makes sense turn on “treat warnings as errors”. Without it, warnings will go unnoticed, and you will never have a clean build.

After the compiler tests, the next level of tests are the unit tests. Most IDEs allow you to run your unit tests by the press of a button. Again, you will have the task switch described above. Furthermore, the results are often not as integrated into the IDE as the compiler output. But if the code compiles automatically when we write our code, why can’t the unit tests run too, nice and clean and fully integrated?

This is probably the question Kent Beck asked himself when he came up with the idea of JUnit Max. JUnit Max is an Eclipse plugin designed to help test-driven development. When you save a file, JUnit Max executes the unit tests for you. Results are shown as red icons on failed tests. Since big projects can have thousands of unit tests, JUnit Max contains some extra logic to order the execution of the tests. First of all, it will run the fastest of your unit tests first. This will provide you with fast feedback. Second, it will run those of your tests that failed recently. Those tests are more likely to fail than those failing a long time ago. Using JUnit Max for test-driven development is awesome. Just save your files and it will provide you with instant feedback. I use it when I do “acceptance unit test-driven development” (see post) and it really speeds things up.

So, what is the next level of tests after unit tests? What is the next level of tests that would be beneficial to execute when you press “save”? How about running some functional acceptance tests automatically? It would probably require a full and successful build. The same prioritization between tests as above would be helpful: run the fast and the shaky tests first. Running acceptance tests automatically could work, and could be useful.

What’s next? Load tests? Full integration tests of a system? And after that, deploy automatically? Imagine, you change a single line of code, save, and minutes later, the changes are deployed live. Now that is what I call continuous deployment! Ok, I agree, deploying without committing to a source repository is probably not what you want. :) (I guess for deployment, doing it 50 times per day is cool enough.)

Black Box Programming

Why do software developers focus so much on the inside of the system when what we really want is to correctly implement the system as seen from the outside? Is it possible to first write the code for the external behavior, and then tweak the inside? Maybe, but we might need to rethink.

I’ve developed reactive system with asynchronous input for many years, such as telephony systems. Over time, I have become increasingly puzzled by the way we develop these systems. A few years ago, I realized what was bothering me.

In the development of a reasonably complex piece of software, there are at least three roles involved: requirements engineering, software engineering and quality engineering. The requirements people look at the system from the outside. They view the system as a black box and define the behavior of Black Boxesthe system by its incoming and outgoing signals (as well as non-functional requirements). Testers also look at the system from the outside. They inject signals and verify expected outgoing signals (as well as non-functional requirements). The software developer wants to implement a system that corresponds to the requirements. Thus, it would be natural for the developer to think of the system as a black box (ignore the internals!). He would start out by implementing the observable behavior, perhaps by describing the incoming and outgoing signals of the system as a state machine. For most non-trivial systems, this is not what we do.

Instead, the programmer focus on what is on the inside of the system. Implicitly, and perhaps scattered over hundreds of thousands of lines of code, various sub-routines define the logic and outgoing signals of our system. The externally observable behavior is a side-effect of these sub-routines. In any non-trivial system, it is almost impossible to verify the correctness by visual inspection. We make our best effort to test our software to ensure the correct behavior, with the cost that it incurs. Focusing on the inside of the system is only natural, since many factors affect the source code of our system (e.g. non-functional requirement such as performance, scalability, robustness etc.). Natural or not, it results in systems difficult to implement correctly. Thus, the question here is: can we do anything about it? Would it be possible to explicitly describe the externally observable behavior in a single place in our code while still being able to satisfy non-functional requirements?

What would this code look like? It would describe the logic of the system: incoming and outgoing signals and the necessary control flow along with some house-keeping data. Let’s call it the Black Box description of the system. It would not contain any implementation details (threading, database storage, performance tweaks, etc.) since implementation details are not necessary to describe the external functional behavior of the system. Everything would be described in the domain language. Let’s take an example: Assume we are implementing an ATM machine implementingATM this behavior. It would have incoming and outgoing signals towards the user interface (user input and text on the display), the bank’s server (requests and responses) and the ATM machine itself (card inserted, eject card etc.). The number of failed PIN attempts would be stored in a variable (house-keeping data). The “Too many invalid PINs” transition (here) would read this variable.

We write the Black Box code so that it is executable in isolation. This means the Black Box code must only talk to abstractions, and never directly to code that contain implementation details of the system (networking, platform specifics and optimizations etc.). Executable in isolation also means it will be unit testable in isolation. By testing the behavior of the Black Box code, we can verify the logic of the system. Obviously, other kinds of tests are required to verify the full implementation (state stored in databases, networking behavior etc., not to mention non-functional requirements). An executable Black Box would be a tremendous advantage. We would not need an implementation to be able to try out our system. Testers could start verifying and integration tests could begin very early on. We could also do rapid prototyping.

Many systems can be implemented simply by a state machine like the one in the ATM example. But large systems tend to be much more complex than that. First of all, we need to be able to address non-functional requirements. There are also some implementation issues to consider. I wouldn’t be able to list all challenges, but to get a feeling for how some of the issues can be addressed, let’s mention a couple of them:

  • Asynchronicity: For example, we are writing a system that requires authentications. As an implementation detail, we may choose to query another machine over the network. Thus, asynchronous results are inevitable. We don’t want the Black Box to reveal this (since our authentication procedure does not concern the end user). In our implementation, we could adapt our Black Box state machine by introducing a sub-state machine where we wait for the response from the other machine. This require us to be able to extend or substitute a state with a sub-state machine.
  • Performance: What if multiple threads are involved in the execution of the Black Box? We might have to divide the state machine so that parts of it is executed in one thread and parts of it in another thread (or even in different processes or machines). Here, too, we need asynchronous communication between threads, much like the above. We also need the ability to execute only parts of the state machine.
  • State: Assume the flow of the state machine does not depend only on input signals but also on some other piece of data. For example, it could be a database query that answers whether a user is registered or not. Somewhere in the Black Box description we might call a function isUserRegistered(). In the implementation, we will use a real database. When executing the Black Box during testing, we let isUserRegistered() return pre-determined values for different test cases, very much like a mock object.

Flower SpiralTo implement larger systems, we would combine the Black Boxes of our sub-systems into a larger whole. We would build a hierarchy of Black Boxes. Black Boxes would communicate with each other through incoming and outgoing signals, which translates into asynchronous signals or synchronous function calls, whichever is most appropriate. The combined system would also be unit testable. Unit testing the combined system would exercise the sub-systems, since the combined system’s behavior relies on the behavior of the sub-systems.

The idea of a Black Box description is definitively not new. There are tools for model-driven design that come very close to what is described above. For example, in Mentor Graphic’s BridgePoint, you can describe your Black Box behavior as a state machine and generate customized code by using what is called a model compiler. I’ve seen several successful projects built on BridgePoint, so the concept seems viable. But most programmers feel most comfortable when the code is at the center of things. You want full control of your code. This may also be a contributing reason why model-driven design has not taken off. So, the question here is really what can be done without advanced tools.

I developed a framework for Black Box Programming a few years ago. It addressed the challenges described above (e.g. replacing a state or transition with a sub-state machine, execution of parts of a state machine, unit testing) and had a couple of nice features (e.g. random walk in the state machine, execution of the unit tests towards the real system). So I think it is possible to develop software like this. The question is just if it is practical. I think the best way to get a feeling for Black Box Programming is to try it out. If you have a project in mind that can be open-sourced, get in touch and we’ll try it out together!