Developer's Dilemma

A blog about the joys and perils of software development

How Do We Know Our Tests Work Tomorrow?

When writing a test case, good practices suggest that we verify the test can fail. But how can we know the test code doesn’t break later?

How do we know our test code work? It helps to apply a test-first mentality to the largest extent possible. This means writing a failing unit test followed by writing production code to pass the test. As a bonus, “failing first” verifies that the test does what it is meant to do: that it verifies the behavior of your code. We make sure the test actually can fail when the code is broken. As with any code, test code is expected to have bugs in it, so the correctness of your test case needs to be verified. All good.

Robert Martin compares unit tests to dual-entry book keeping, which is a nice analogy: “Accountants who don’t hold to the GAAP [Generally Accepted Accounting Principles] tend to wind up in another profession, or behind bars. Dual Entry Bookkeeping is the simple practice of entering every transaction twice; once on the debit side, and once on the credit side.” The tests (debit) make sure the production code (credit) is correct, while the production code makes sure the test is correct. At least this holds when first writing the test code using test-first methdology above. But does it apply later on?

I have said the following on several occasions: “How can this not work? We have a test case covering exactly this!” Digging deep, it turns out that the test code has broken at some point, and I dare not look in the code repository how long ago. Maybe the code was refactored, and an assert was somehow invalidated or removed. (Examples in funny languages: I had a bash script testing an app. I accidentally removed an “exit 1″ statement causing it to never fail. Found out by accident when the app was malfunctioning! At another occation, I wrote this in JavaScript: “test.notEquals(200, response.statusCode)”. I later refactored it to be “status_code”. The problem was that in the production code, I had renamed it to “status” (doh!), giving “response.status_code” the value “undefined” which causes the assert to never fail. Found out by accident when the app was malfunctioning! I also have numerous examples in Java…) We have test code to make sure our production code works. The test code is meta code, looking at the real code. But what mechanisms are there to guarantee that the test code can actually fail still? What about the meta-meta level?

Introducing the “test case test case” (meta level 2): I was thinking about using dependency injection to feed my test case (meta level 1) with a mock application (faking meta level 0). Perhaps we could feed the mock application with erroneous behavior in order to trigger the asserts in the test case (meta level 1). Fantastic! But this makes me concerned about the correctness of the “test case test case”. Hmm, better go into meta level 3. Ok ok, sort of kidding of course.

So that was half a joke, but I’m serious about the problem. I see this over and over and it bothers me that we have tests, but we cannot be sure they do what they should. Over time, I would not be surprised if more and more test cases break in most software projects. Sure, I sometime go into my production code and “return zero instead of 200″, just to make sure something breaks in the test suite, but it doesn’t really scale… I want regression tests for my test cases! :) Any ideas?

The Random Sustainable Coder

Looking for good books? Here are two on software development and one on… randomness.

The Clean Coder: A Code of Conduct for Professional Programmers” by Robert Martin addresses what it takes to be successful in the long term as a software professional. First and foremost, a software professional takes responsibility for himself, his peers and his product. For example, he speaks the truth. He voices his opinion if a deadline will not be met. Martin has a series of conversations between different roles (such as programmer, project leader) which are pretty entertaining. There are lots of topics in this book that are important if you want to be long-lived and respected as a software professional. If you don’t have time to read the whole book, at least read the first chapter, which is gold.

Sustainable Software Development: An Agile Perspective” by Kevin Tate addresses what it takes to be successful in the long term when developing a software product. It is not unusual to see software products released after a year and then maintained for ten more years. This sets some specific requirements on the software, the development environment and the mindset of the developers. For example, the software must be able to accommodate change at reasonable cost and without breaking. In order to be able to deliver fixes and new features, the product must always be in good shape. Preferably good enough to be delivered every day, if so desired. This means the development process and the mindset of the developers must be heavily focused on quality. We must shift from defect detection (finding bugs) to defect prevention (avoiding bugs). Root cause analysis is key. Read the book, the message is great.

Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets” by Nassim Nicholas Taleb is a very interesting book! It is about randomness and how it affects our life. For example, how much of the success in your own career is due to chance? According to the author, if you’re a stock broker, it’s probably close to 100% (Taleb does not go easy on his fellow finance colleagues! :). If you’re a software developer, my gut feeling is that luck is much less significant. Re-live your life a million times, and your average life salary would probably still be around your current. Not so for the stock broker.

Taleb also describes an interesting scam. Send one million emails saying “the stock market will rise next week”. Send another million emails saying “the stock market will fall next week”. Next week, if the stock market rises, keep only the first million email addresses. To half of them, send an email saying “stocks will rise next week”, and to the second half “stocks will fall”. Every week, keep only the half where you predicted correctly. After 10 weeks you will have hundreds of people thinking you have super powers… This is called “survivor’s bias”. Taleb applies the exact same reasoning when a stock broker asks him for a job. Taleb won’t ask “How did you manage to do so well in the stock market in the past?”, but rather “How many like you were there from the beginning?”! This is a fantastic book, read it.

Sustainable Software Development, Part 2: Managing Complexity

As the code base grows, the complexity of your code increases. The pace of software development often slows down as the product matures. Why is this? What can we do to manage complexity?

Sustainable software development is about retaining software development velocity over the life-time of your product. In part one, I covered the technical debt side of sustainable software development. In essence, when you postpone refactoring, bug fixing, testing etc., you accumulate debt. As all debt, it has to be re-payed. But even worse, you will pay interest. Every time you think about, swear about, discuss, plan or anything (but fix) your known problems, you pay.

In this second part, we address another challenge to a sustainable pace of development: the increased complexity of the software. In order to avoid letting the complexity make your life miserable, we need to manage the complexity. For brevity, we will touch only upon these four areas:

  • Simplicity
  • Dependencies
  • Hierarchy
  • Abstraction

Fred Brooks makes the distinction between essential and accidental complexity (also discussed in his book “The Mythical Man-Month”). Essential complexity comes from your problem domain. The problem will have an inherent complexity you cannot get away from. Accidental complexity comes from the solution you’ve chosen. The “Keep it simple, stupid!” (KISS) design principle encourages developers to prefer simple solutions over those more complex. Thus, in order to manage complexity, it is important to keep the accidental complexity to a minimum. For example, dividing your code using the Single Responsibility Principle is one way, so that different pieces of functionality are not tangled up in a single class.

Another example of accidental complexity is the introduction of unnecessary dependencies in your software. Every subset of your code will have connections and communication with other subsets of your code. In a naive software implementation, the number of connections would grow wildly with the size of the software. The interconnectedness of your code is one aspect of complexity that needs to be managed. Decomposition and decoupling will help you, and they should be applied to all levels of your system: service, sub-system, module, class, function etc. Various design principles (e.g. Dependency Inversion Principle) and patterns can guide you.

Steve McConnell discusses hierarchy and abstraction in his article “Keep it simple” (and in his book “Code Complete”). Structuring your software as a hierarchy is a means of decomposing your solution into more manageable pieces. Normally, different kinds of hierarchies exist within your code. For example, think about the difference between the relation between objects (“which object creates/destroys/contains/depends on/owns the memory of which”) and the relation between functions (“which function calls which”). One architectural pattern that can help the high-level structure of your code is Layer (defined in the book “Pattern-Oriented Software Architecture: A System of Patterns” by Buschmann et al.). One upside of a structured approach like Layer is that tooling, such as automatic mapping which classes depend on which, can help you enforce your structure over time.

Abstraction is all about what level of detail is visible at what level (see also my post on abstraction). For example, the file system is an abstraction hiding the hard-drive with its tracks and sectors. When writing a program, you refer only to “file x”, and not to “track y, sector z”, which helps tremendously. Abstraction will help you reason about your program. Abstractions are largely domain dependent. Your domain will decide what is a good level of abstraction, hiding unnecessary detail while not constraining you.

Last words from McConnell’s “Keep it simple” article: “Neither hierarchies nor abstractions reduce the total number of details in a program — they might actually increase the total number. Their benefit arises from organizing details in such a way that fewer details have to be considered at any particular time.”

How do you fight complexity in your daily work?

Characteristics of a Software Professional

At work, I have been challenged with the question “What are the most important characteristics of a software developer?“. This is a tough question, and no matter what you decide to include, you have to leave something out.

I’ve been part of software development projects in various companies. The successful projects teach you invaluable lessons. The dysfunctional projects even more so. Inevitably, a list of characteristics would include qualities I value and desire in my fellow colleagues, as well as characteristics I’d expect them to want in me. (So this is also a long todo list for me. ;)

To get some kind of structure, I decided on three main categories: Professionalism, Long-term code and Quality mindset.

Professionalism

Take responsibility

You are a professional developer, and professionals act responsibly. If things go wrong,
take responsibility. Understand why things went bad. Learn, adapt and make sure it
never happens again. When faced with a difficult choice, “do the right thing”. Optimize for the long run even if it results in more work today. Be a team player, even if this means saying no. Speak the truth.

Know your product

We’re part of a business. Without successful products, there will be no business.
Know your users and their needs. Use your product, use competitor’s products,
visit customers and watch them use your product (from purchase/download to
installation to day-to-day usage to upgrade to uninstall and so forth).

Continuous learning

Be humble and practice relentless inquiry. Question everything (since everything has potential for improvement), embrace questions on your code, realize others’ suggestions might improve your work, be happy other developers change “your” code
(it’s good enough to understand!). Improve yourself and others. Become a better developer by reading books, papers, blogs etc. Watch instruction videos, listen to podcasts, try new technologies, participate in open source development, discussion groups etc. Discuss your code and what you read with your peers. Talk to developers of other specialities. Teach others what you know well.

Long-term code

Communication

You write a line of code once, but it is read hundreds of times. Invest time to write code easy to read for others. Write code at the right level of abstraction, abstract enough for expressiveness but without hiding necessary detail. Adhere to design principles as
they capture proven ways to high-quality code. Apply design patterns to better communicate your intent.

Maintainability

Decouple the different parts of your software, on every level – sub-system, module, class and function. Write extendable code, so that you can add functionality with minimal change to existing code. Avoid technical debt, and repay debt as soon as possible. Interest has to be payed on all debt.

Proven functionality

Never deliver code unless you’ve proven it works. If you don’t test it, it will be faulty.
Write testable code. Make it testable from the start, later it will be too expensive. Automate your tests to run before check-in, after check-in and nightly. If the tests are not executed automatically, they will not be updated and soon be obsolete. Without automated tests, no-one will dare change any code. Write fast and reliable unit tests. Write realistic integration tests. For each new feature, write acceptance tests. Automate everything.

Quality mindset

Quality is your responsibility

You, as a developer, is responsible for the quality of the product. Other roles can help you spot problems or give you more time, but they cannot affect quality. Never ship anything without being certain of its correctness.

Find bugs early

Find bugs early in the development process. If a bug can be found by the developer, it should be. If you need tools, get them. If you need hardware, get it. If a bug is found late, understand why it was not found earlier. Fix the process so that bugs of this kind never slips through. Automate.

Fix it now

If you find a bug, fix it now instead of filing a bug report. Ask your colleagues to help out. You will save time. File bug reports on things that couldn’t be solved in half a day.
Do things properly the first time. If you don’t have time to do it right today, when are you ever going to find time? Give time estimates that allow you to produce quality products. Think about what is stopping you from being more productive. Fix it, and then move on to the next thing stopping you.

If you’re interested in these things, you should have a look at Poppendiecks’ “Lean software development” books, Senge’s “The fifth discipline“, Martin’s “Clean coder“, McConnell’s “Code complete“, McConnell’s video “World class software companies” and others. I also provide some resources in previous blog posts (e.g. videos and books).

What characteristics do you value in a software developer?

Great Videos (and Articles) for a Software Developer

Commuting to work by public transportation? Doing the dishes? Regardless, you might want a distraction. I found some time during the holidays to look at some new articles and videos on the web. (This is a digest from my Twitter feed. Follow me @johnnybigert.)

Great stuff:

  • Steve McConnell – brilliant as always. What makes a World Class Software Company? video
  • Great talk on API design by Joshua Bloch. video
  • “JavaScript the Good Parts” video. Here’s something you (or I) never expected me to say: JavaScript is a really cool language! See for yourself: video
  • No refactoring? Good article on design principles: “SOLID is append only”. article
  • Nice paper on how Spotify’s peer-to-peer technology works (by former colleagues from KTH :) article

Good stuff:

  • Demo of how to set up an Windows Azure project and run it locally. Extensive prerequisites! video
  • Local file storage, 3d animation, audio analysis etc. in HTML5 today! Demo/presentation video. video
  • Intro video, Amazon Web Services: wow! CloudFront, Queues, SimpleDB, MechTurk, Private Cloud, DNS… It goes on and on! video
  • Confusing presentation but great message: “Jellyfish makes sure your common JavaScript code works for any browser/target”. video
  • Nice intro to jQuery from a twelwe-year old kid(!). I’m humbled. video
  • Domain driven development? The “Supermodel” DDD framework: “make simple MVC stuff easy and complex stuff possible”. video

Behavior Driven Development

Behavior Driven Development (BDD) is a flavor of Test Driven Development (TDD). In BDD, we have a specification focus instead of a test focus. What does that mean? And how can it help us write better software?

I just watched Dave Astels’ talk on “Beyond Test Driven Development: Behaviour Driven Development” (video). The first time I ran across Behavior Driven Development (BDD), it struck a chord. It is branch of Test Driven Development (TDD) that suites my style. And as always, hearing someone else’s viewpoint makes you think about your own practices. I definitively learned a lot from the video which describes the rSpec BDD framework (for Ruby). Let’s review some highlights of BDD.

As mentioned, BDD is related to TDD. In the video, Astels stresses the fact that we should try to get away from the testing mindset, and focus on writing specifications of the system behavior instead. In BDD, we have a specification focus instead of a test focus. The name of a BDD test case should read as if it specifies a small piece of system behavior. For example, let’s say you’re writing software for a web shop (as in the wikipedia article on BDD). One test case could be named “refunded items should be returned to stock” (or refundedItemsShouldBeReturnedToStock). The test code is there to clarify the details of the specification.

From a philosophical standpoint, the word “test” makes me think of testing something existing, while “specification” makes me think of specifying the behavior of something not yet built. It might help your thought process when decoupling the test cases (= the specification) from the underlying implementation. This ties into one of the drawbacks we often see with traditional unit testing. A recurring pattern is e.g. a ShippingOrder class with a corresponding ShippingOrderTests test class. This tightly couples the test code with the production code. Refactoring becomes painful as test classes might become obsolete if you decide to split a production code class in two.

Instead, Dave Astels mentions that in BDD, you should organize your test classes not around production code classes, but test fixtures. Each test class has a test fixture, and that fixture captures a specific system setup. (In short, a fixture creates objects from your production code classes in a setUp() function and cleans everything up in a tearDown() function. The setUp() function is executed before each test case in the test class and tearDown() is executed after each test case.) Thus, all tests in a test class would run against the same system setup. For example, if several test cases revolve around returning or refunding items in your shop, create a RefundItemTests test suite.

Since system behavior is not restricted to a single method or class, the notion of “unit” in “unit testing” needs to be widened. Any reasonable subset of the code could participate in the test fixture (even the full system, as described in my post on edge-to-edge unit testing). I like that, since it allows you to find and test on the stable boundaries within your system, thus making the tests less brittle. This should be contrasted with having a one-to-one correspondence between production and test classes, which will be very brittle as the code evolves.

If you have good or bad experiences with BDD, or just random thoughts on unit testing, please drop a line below.

Sustainable Software Development, Part 1: Managing Technical Debt

Software projects sometimes go bad. The pace of development is not sustainable. To achieve sustainable software development, we need to keep our focus on what’s important: the long-term health and maintainability of our source code.

Robert Martin has written a nice article on Scrum projects, and how applying Scrum often start out with hyper-productivity. However, most projects slow down when code size increases. After a while, we might even experience that progress approaches zero as time approaches infinity (if the project is not discontinued before it :).

Why is that? Martin lists a number of activities that are well-known to the experienced developer, such as unit testing, continuous integration and code coverage measurements. Most developers would even agree that applying them will help retaining productivity over the long run. Still, the law of least resistance often takes us down a different path. A strong focus on feature growth is very seductive, as it makes everybody happy in the short term: the customer is happy about their new features, the product owner is happy when he can satisfy the market and the team is happy when everybody commends them for their work. At least for a short while. The lack of focus on quality (such as a lack of automated unit tests and a poor continuous integration setup) will catch up with you.

Technical debt are the things you know you must do but haven’t done yet: fixing a bug, adding more unit tests, refactoring, cleaning up etc. Just as normal debt, it has to be payed back. No surprise there. Unfortunately, just like a normal debt, technical debt also come with an interest rate. You pay interest every time you say “aargh, this would have taken me ten minutes had I refactored this module” or “aargh, I just spent half a day tracking down a problem caused by a known bug”. Discussing, planning or thinking about what must be fixed is paying interest. Soon, your house belongs to the bank. Software development like this is not sustainable – it will come to a halt.

To avoid this, and to achieve sustainable software development, we need to keep our focus on what’s important: the long-term health and maintainability of our source code. Just like the debt analogy, there’s a savings analogy. Invest some money in a savings account (or the stock market), and you will receive interest instead of paying. As a start, it is necessary to make an investment in good practices: fix things now, and use tools and techniques to ensure quality. You will be able to deliver new features for a long time and your velocity will stay high, or even increase over time.

Abstraction vs Compression

In our daily communication you might hear things like “higher level of abstraction”, but what is abstraction? And how does it relate to compression?

Wikipedia says this about Abstraction: “Abstractions may be formed by reducing the information content of a concept or an observable phenomenon, typically to retain only information which is relevant for a particular purpose.” For example, a ball is an abstraction of a football and other types of balls. Abstraction in software development is about removing details from something concrete; there’s a loss of information. For example, a class for sending packets over a TCP socket is a concrete concept. An abstraction could be a Network interface with a send function. In the abstraction, we remove the details of exactly how the data is sent.

Compression, on the other hand, is about hiding information. Although not visible, the information is there and can be retrieved if necessary (like unzipping compressed content). One example is procedural programming: Reading a function should give you a good picture of what the function does. If you need more information, you can always go into the called functions for details. No information is lost.

Using an interface in object-oriented programming introduces an abstraction. In runtime, in the general case, you cannot know which class implements an interface. You lose information. This can make your object-oriented code hard to understand, review etc. Some may argue that this is not a problem: If your interfaces are clear, it does not matter who implements it. It should be sufficient to know that the subclass carries out the work according to the specification of the interface (honoring the Liskov Substitution Principle). Still, the information loss can be a challenge.

Other uses of the word “abstraction” in software development may appear when people mention things like “programming at a higher level of abstraction”. The Network interface from above is a good example. It is on a higher level of abstraction than the more low-level TCP implementation (which itself hides raw socket operations). What if we had a Network class with a send function, and it implementing the TCP socket sending. Is this an abstraction? The public functions of Network hides the details of TCP packet sending etc. By going into the Network class, we can retrieve all details of exactly how packets are sent. Information is hidden, but not lost. If no information is lost, this is rather “programming at a higher level of compression”. :)

After putting you through this, I must say that in our daily communication, we (myself included) don’t pay much attention to the distinction between abstraction and compression: “abstraction” normally means any kind of information hiding or removal. But it’s useful to know the difference since it affects the understanding and readability of your code.

The Diamond Shape

How can your classes talk to each other if they don’t know of each other? Hint: a diamond might come handy.

Revisiting a topic from my KTH lecture in October, I wanted to discuss the “diamond shape”.

Imagine you want to write a simple networking application. You have at least two responsibilities represented: first, an Application class is responsible for implementing the application logic. Second, a Network class is responsible for implementing sending and receiving packets over the network. As an example, let’s assume that when the user presses a button, the Application needs to tell the Network class to send some data. When a response is received over the network, the Network class needs to tell the Application class that some data was received. Each class needs to talk to the other class.

The interaction between Application and Network requires bi-directional communication. Despite this, we don’t want the two classes to refer directly to each other, since this would lead to problems (tight coupling, poor testability, unnecessary knowledge in the Network class about the full public API of Application etc.). Thus, we want bi-directional communication between two classes, but they mustn’t know of each other or depend on each other. Quite a dilemma.

To resolve the dilemma, we introduce two interfaces. The Network class implements an interface INetworkSender (with a function sendData(data)). When the Application wants to send something, it calls a function in the INetworkSender interface which in turn ends up in the Network class. Conversely, we let Application implement INetworkReceiver (with a function onDataReceived(data)). When a network packet arrives, the Network class calls a function in the INetworkReceiverInterface which ends up in Application.

What is important here is the way the Application class communicates with the Network class and vice versa. If you draw a diagram of the two classes and two interfaces, the picture resembles a diamond shape. The right-hand path is where Application talks to Network (top to bottom) through INetworkSender. The left-hand path is where Network talks to Application (bottom to top) through INetworkReceiver. In general, one path is for function calls in one direction while the other path is in the other direction.

The diamond shape decouples our classes despite bi-directional communication. This is a powerful thing and it’s widely used when building large software systems. In particular, it is used in the Layers architectural pattern. We keep the upper layers unaware of lower layers to obtain a flexible and reusable solution.

The decoupling of Application and Network helps us keep the application logic from the networking. It also gives us testability. If we let the Application class talk to a mock class instead of Network, we can verify the application logic fast and reliably, without complicating things with a real network. Also, with a clear separation, it is easier to spot where application logic depends on network behavior.

Object-Oriented Programming Lecture at KTH: Slides etc.

Thank you all that participated in my lecture at KTH October 26, 2011. I had a lot of fun, and we had some good discussions. For you who were not there, it was about object-oriented programming and how to write good code. We used some example code to discuss OO principles and testing. Here’s the presentation (pdf)!

A couple of posts of relevance: