Was This Test Written to Specify or Fixate Behavior?

When you read unit test code, have you ever wondered why some tests tell a story (easy to read) while others just feel mechanical (more computer readable than human).

While I have no definitive answers on the human/non-human readable unit tests subject, I have an insight that came to me and a colleague some time ago: Some tests are written as a specification, and some tests are written to fixate. Let me explain what I mean.

When you write a test as a specification, the test code is normally written upfront or together with the code. The purpose of the test code is to document the behavior of the code. Thus, the developer makes an effort to make the test code human-readable. The tests are often used as an acceptance criteria, to verify a piece of functionality was implemented correctly. The fact that you get an executable specification is a great bonus, since the test code would be valuable just as an example.

Other tests are written to fixate the existing behavior of the code. Typical examples are tests written long after the production code in order to cover legacy code with tests. Legacy code can be hard to test, so the test code is difficult to write in human-readable way and tends to be more “mechanical” in nature. The focus of these tests is execution, since the purpose is to make sure refactoring does not break existing behavior.

Naturally, we want our test code to lie in the specification category. But if we have a piece of code that is really hard to test, we can’t just start refactoring blindly. So we write the best tests we can, and refactor the code and tests in parallel. After some work, the production code is in better shape and we have moved our test code into the specification category.

When you write test and production code from scratch, you have every opportunity to write good tests. Make sure you make your code human-readable. The computer will do just fine. :)

How Do We Know Our Tests Work Tomorrow?

When writing a test case, good practices suggest that we verify the test can fail. But how can we know the test code doesn’t break later?

How do we know our test code work? It helps to apply a test-first mentality to the largest extent possible. This means writing a failing unit test followed by writing production code to pass the test. As a bonus, “failing first” verifies that the test does what it is meant to do: that it verifies the behavior of your code. We make sure the test actually can fail when the code is broken. As with any code, test code is expected to have bugs in it, so the correctness of your test case needs to be verified. All good.

Robert Martin compares unit tests to dual-entry book keeping, which is a nice analogy: “Accountants who don’t hold to the GAAP [Generally Accepted Accounting Principles] tend to wind up in another profession, or behind bars. Dual Entry Bookkeeping is the simple practice of entering every transaction twice; once on the debit side, and once on the credit side.” The tests (debit) make sure the production code (credit) is correct, while the production code makes sure the test is correct. At least this holds when first writing the test code using test-first methdology above. But does it apply later on?

I have said the following on several occasions: “How can this not work? We have a test case covering exactly this!” Digging deep, it turns out that the test code has broken at some point, and I dare not look in the code repository how long ago. Maybe the code was refactored, and an assert was somehow invalidated or removed. (Examples in funny languages: I had a bash script testing an app. I accidentally removed an “exit 1” statement causing it to never fail. Found out by accident when the app was malfunctioning! At another occation, I wrote this in JavaScript: “test.notEquals(200, response.statusCode)”. I later refactored it to be “status_code”. The problem was that in the production code, I had renamed it to “status” (doh!), giving “response.status_code” the value “undefined” which causes the assert to never fail. Found out by accident when the app was malfunctioning! I also have numerous examples in Java…) We have test code to make sure our production code works. The test code is meta code, looking at the real code. But what mechanisms are there to guarantee that the test code can actually fail still? What about the meta-meta level?

Introducing the “test case test case” (meta level 2): I was thinking about using dependency injection to feed my test case (meta level 1) with a mock application (faking meta level 0). Perhaps we could feed the mock application with erroneous behavior in order to trigger the asserts in the test case (meta level 1). Fantastic! But this makes me concerned about the correctness of the “test case test case”. Hmm, better go into meta level 3. Ok ok, sort of kidding of course.

So that was half a joke, but I’m serious about the problem. I see this over and over and it bothers me that we have tests, but we cannot be sure they do what they should. Over time, I would not be surprised if more and more test cases break in most software projects. Sure, I sometime go into my production code and “return zero instead of 200”, just to make sure something breaks in the test suite, but it doesn’t really scale… I want regression tests for my test cases! :) Any ideas?

Test-Driven Development Done Right

A couple of years ago, I had at least two misconceptions about Test-Driven Development: (1) you should write all your tests up-front and (2) after verifying that a test case can fail, you should make it pass right away. To better understand TDD, I got a copy of the book “Growing Object-Oriented Software Guided by Tests” by Freeman and Pryce (now one of my absolute favorites). Although the book does a great job explaining the concepts, it took me ten chapters to admit I had been wrong. Let’s never do that again. :)

Let me walk you through my misconceptions so that you don’t have to repeat my mistakes:

Misconception 1: write all tests up-front. Thinking about potential test cases up-front is not a bad thing. It will exercise your imagination and with some luck, many of them will still be applicable after the code is written. But don’t waste energy trying to compile an exhaustive list of tests. At least for me, this approach didn’t work since my imagination appears too limited to come anywhere near the final list. But foremost, I would like to get going writing some test code!

You are better off writing a few happy-path test cases. Filling in the test code will get you started working on the user interface of your classes. When the test code starts acting as a “user” of your interface, it will be obvious to you whether the API is okay or awkward to work with. The tests will drive you to improve your user interface. Creating the user interface will invariably make you think about error cases and how the API can be abused or misunderstood. You will come up with more test cases, and implement these. With some effort, but surprisingly little so, you will grow your test suite.

Misconception 2: fail the test, then make it pass right away. When you have written your test code, filled in the production code to get it all to compile and seen the test fail, it is very tempting to just fix things. Make the changes to have the test pass. Actually, you can do that. But there are at least two reasons not to.

First, I strongly prefer an incremental approach. I fix only the problem reported by the test! If the test says “null pointer exception”, I will fix it. Running the test again, you will get another failure and fix that. This is the convenient/lazy approach, you just let the test drive you. Also, it will result in minimal increments, which is very helpful if another test case would break while changing the production code.

Second, fixing the failed test case right away will throw away a lot of information in the process. When the test fail, it will provide you with valuable information on what went wrong. If you cannot immediately understand what the problem is, maybe you should improve your test or production code? For example, if you get a “null pointer exception”, maybe error handling or an assert earlier in the production code could make sure your program never gets into that kind of a corrupted state. Alternatively, you could improve your test code with all kinds of helpful diagnostics. The idea is, if it takes time for you to understand what went wrong today, imagine how much time will be wasted when the same test case fails in six months. “Growing Object-Oriented Software Guided by Tests” says you should “let the tests talk to you”. There you go, you are test-driven.