How Do We Know Our Tests Work Tomorrow?

When writing a test case, good practices suggest that we verify the test can fail. But how can we know the test code doesn’t break later?

How do we know our test code work? It helps to apply a test-first mentality to the largest extent possible. This means writing a failing unit test followed by writing production code to pass the test. As a bonus, “failing first” verifies that the test does what it is meant to do: that it verifies the behavior of your code. We make sure the test actually can fail when the code is broken. As with any code, test code is expected to have bugs in it, so the correctness of your test case needs to be verified. All good.

Robert Martin compares unit tests to dual-entry book keeping, which is a nice analogy: “Accountants who don’t hold to the GAAP [Generally Accepted Accounting Principles] tend to wind up in another profession, or behind bars. Dual Entry Bookkeeping is the simple practice of entering every transaction twice; once on the debit side, and once on the credit side.” The tests (debit) make sure the production code (credit) is correct, while the production code makes sure the test is correct. At least this holds when first writing the test code using test-first methdology above. But does it apply later on?

I have said the following on several occasions: “How can this not work? We have a test case covering exactly this!” Digging deep, it turns out that the test code has broken at some point, and I dare not look in the code repository how long ago. Maybe the code was refactored, and an assert was somehow invalidated or removed. (Examples in funny languages: I had a bash script testing an app. I accidentally removed an “exit 1” statement causing it to never fail. Found out by accident when the app was malfunctioning! At another occation, I wrote this in JavaScript: “test.notEquals(200, response.statusCode)”. I later refactored it to be “status_code”. The problem was that in the production code, I had renamed it to “status” (doh!), giving “response.status_code” the value “undefined” which causes the assert to never fail. Found out by accident when the app was malfunctioning! I also have numerous examples in Java…) We have test code to make sure our production code works. The test code is meta code, looking at the real code. But what mechanisms are there to guarantee that the test code can actually fail still? What about the meta-meta level?

Introducing the “test case test case” (meta level 2): I was thinking about using dependency injection to feed my test case (meta level 1) with a mock application (faking meta level 0). Perhaps we could feed the mock application with erroneous behavior in order to trigger the asserts in the test case (meta level 1). Fantastic! But this makes me concerned about the correctness of the “test case test case”. Hmm, better go into meta level 3. Ok ok, sort of kidding of course.

So that was half a joke, but I’m serious about the problem. I see this over and over and it bothers me that we have tests, but we cannot be sure they do what they should. Over time, I would not be surprised if more and more test cases break in most software projects. Sure, I sometime go into my production code and “return zero instead of 200”, just to make sure something breaks in the test suite, but it doesn’t really scale… I want regression tests for my test cases! :) Any ideas?

Leave a Reply

Your email address will not be published.