Zero Tolerance: Writing Good Software Is Tough

I want to work with clean and readable code, with decent test coverage. Thus, when I run across some small flaw in the code (e.g. code duplication in some test scripts), it should be fixed. But I admit, sometimes I just convince myself “hm, I’m so close to getting this feature up and running, let’s fix this code after that”. This almost always give me more agony later.

So when I had the opportunity to participate in a new project with no dependencies on legacy code, I thought I would try something new, just for fun. Let’s call it zero tolerance: “when finding flawed code, fix it now. Flawed code could be anything: smelly code , confusing script names, non-descriptive function names, bad file structure, incomprehensible test cases, you name it. Let nothing pass!

Trying it out, my first observation was that it was painful! You have this new, fully functional feature within reach and this crappy code that requires a one-day refactoring appears. It is tough on your mental health to just ignore the easy-to-reach reward and do the really boring stuff. Especially when your threshold to what needs to be fixed is so low (actually, it’s zero). Sometimes it will feel like all you do is fiddle with old code.

Stick or carrot

Stick or carrot?

The upside is that all the punishment will make you learn (stick-style rather than carrot). You know that if you don’t concentrate and do things properly the first time, you will suffer later. And like I said, the suffering will inevitably happen when you are just about to write those last lines to get your system ready for a test spin. I would say it motivates you to produce code of higher quality from the beginning.

When I first started doing the zero tolerance thing, I expected it to get easier over time. The funny thing is, the amount of code that needs to be fixed seems to be constant over time! How is that possible? Maybe I am just sloppy and introduce lots of bad code? (Probably, but the punishment makes me improve! :) Maybe it is just impossible to foresee all the problems that may appear? Maybe. But I also think something else is at play.

Large software projects often grind to a halt. The complexity of the software becomes overwhelming and new features are delivered more and more slowly. Why is that? With increasing amounts of code, there will be increased opportunities for sloppy code. It will pile up. Without proper care, the amount of code that needs to be fixed will grow over time (probably much faster than the code size grows, the broken windows theory doesn’t help either). So maybe being constant over time is really a great outcome.

Like I said, I was hoping that the work load from code fixing would actually decrease over time. Maybe software is just hard (in terms of essential complexity), and there’s not much we can do about it. We will have to do our best not to make things worse by introducing too much accidental complexity. I will keep doing the zero tolerance thing for a while. I’ll report back on my findings if something interesting happens.

If you decide to try the zero tolerance approach, be warned: you should probably do it on a new piece of code. If you choose a piece of code that was written with a normal level of acceptance for design flaws and technical debt, you will probably lose your mind before you get any new functionality in there!

Eclipse Won’t Run All My JUnit Tests

Here’s a Eclipse/JUnit4/Maven problem I came across, maybe this “solution” can help someone else.

I was using Eclipse to unit test some Maven projects in Java. I noticed that Eclipse only executed two out of my four JUnit4 unit tests. Right-clicking a test case and choosing “Run” on it gave me an error: “Unrooted test”. The more popular tips from Google (e.g. Nabble and Stack Overflow) suggested the code was ok, but still Eclipse was skipping my tests. Then I found this post on RTFM which said “recompile your code using mvn clean install“. Rediculously simple, but it worked for me. (Ironically, running Maven is something I normally do twenty times per day…)

The funny thing is that JUnitMax (which automatically runs all tests when a file is saved) managed to run the skipped tests even though Eclipse couldn’t! I could tell it correctly passed/failed the test when I changed it. I am still confused about that. :)

Decoupling Starts with an Interface, but Where Does It End?

Writing decoupled software components means writing them so that they depend on each other as little as possible. If one of your classes uses another class directly, they will inevitably be coupled together. Use one, and you will have Couplingto use the other. To decouple components, we normally think of letting our classes talk to interfaces instead of concrete classes. Any subclass can hide behind the interface, so the coupling between the classes is reduced. One good example of “talking to interfaces” is the Factory method design pattern. Instead of using new on a concrete class, you will ask an interface to create the object for you. But decoupling is more than just talking to interfaces.

When I first started programming Java, I came across Spring and its inversion of control containers (IoC). Compared to the Factory method above, Spring IoC takes object creation decoupling a step further. By using XML files, you specify which object to create and which arguments to give to the constructor (e.g. other objects). Coming from C++, it was pure magic the first time I saw it. Nowhere in the code do your classes refer to each other. Now, that is decoupling!

Later, I started working with OSGi. It is a Java framework which allows for independent life-cycles of modules (called bundles). For example, you could replace a bundle with a newer version during run-time. Other bundles won’t notice they are communicating with something new. Bundles communicate with each other by publishing and consuming services, which are just plain interfaces. XML is used to specify what services are published and consumed by a bundle. Similar to above, individual modules never refer to each other in the code. Again, that is decoupling.

Web APIs, such as a REST or SOAP API over HTTP, are often used to provide access to web or cloud services. Also, more and more enterprises expose their internal sub-systems as web services. To assemble software using web APIs, you can combine software components that may execute anywhere in the world. The pattern here seems to be that the larger the software component, the more decoupling is possible (or required?). Decoupled components through interfaces needed to be compiled together. Decoupled components through web APIs need not even be in the same time zone. The downside is that more powerful decoupling techniques require a lot of work.

Combining decoupling on different levels will make your system more resilient to change and open up for use in new and unexpected ways. The next time you think of ways to make your code more loosely coupled, remember that decoupling is not only about interface classes!

Strands, Concurrency and Message Passing

When writing multi-threaded programs, we need to protect data from concurrent access from multiple threads. Using locks for synchronization makes software development difficult, with run-time problems such as deadlocks and live-locks. There are also design problems: locks break modularity since they do no obey module boundaries. It is very possible that a lock taken in one module will cause a deadlock in another. This might require exposing locks that should have been private (for an example, see article Beautiful Concurrency).

One solution to the above-mentioned problems is simply to avoid thread-shared data. Message passing is one way to achieve this. It is typically implemented as active objects, where each module would have a thread of its own with a synchronized queue. MessageOne downside compared to using threads-and-locks is that return values often need to be asynchronous, which makes the source code harder to follow. Also, given your computer’s number of cores and hardware threads, excessive numbers of concurrent threads will result in poor performance due to context switching and page faults. We would like a more optimal number of threads, while still keeping the message passing paradigm.

Some time ago, I came across the concept of strands: message passing that uses a fixed number of threads while still supporting a very large number of message receiver objects (called strands, which means thread contexts). Conceptually, you can think of a strands handler as a set of message receiver objects, a job queue and a fixed set of worker threads. When a worker thread becomes available, the first job in the queue will execute. Executing the job means passing a message to a receiver. The job contains the message and an receiver identifier. The identifier uniquely identifies one of the message receiver objects in the strands handler. Typically, “passing a message to a receiver” means executing a member function on the message receiver object with some parameters. When the execution of the member function finishes, the thread will be free for new jobs.

The important thing is to guarantee that you do not read/write the same data at the same time from multiple threads. To achieve this, the strand handler makes sure the message receiver object is assigned to at most one thread at a time. This also means the message receiver objects should not be accessible outside of the strand handler.

The C++ Boost library has an implementation of strands in asio. Unfortunately, I don’t know of a strands library for Java. Let me know if you know of one!

 

Update (May 7, 2011): A colleague of mine made me aware of the concept of an actor, a concept very similar to a Strand, but more elegant I must say. And there are libraries for Java, e.g. GPars (see also example code in Dr Dobb’s).

Lean Software Development

I just finished reading the book “Implementing Lean Software Development – From Concept to Cash” by Mary and Tom Poppendieck. In essence, “lean” means “reduce waste”, and “waste” means “everything that does not make your customers happier”. Many concepts in the book was pioneered by Toyota while applied to manufacturing. Toyota then moved on and introduced lean in product development (including software).

Implementing Lean Software DevelopmentWhen a customer ask you for a feature, how long does it take until they get it (calendar time)? While implementing it, how much effective time have you spent on the feature? The difference between calendar time and implementation time is waste (since waiting makes your customer unhappy). This waste can be analyzed using Value Stream Mapping (example). Typical things that takes time for no good reason is waiting for approval (solution: approve within 24h instead of in the bi-weekly meeting) and long fixed releases. I have come across really long release cycles in previous work places and when talking to friends at other companies. (Half a year, a year is not unusual, which should be contrasted with the book’s claim that Toyota developed and released the Prius within a year and a half!). The solution is to release and/or deploy more often. In order to do this, we need a relentless focus on quality (= no bugs).

According to the book, having bugs in a bug tracking system is waste. Software with bugs in it is not finished work, and piling up unfinished work is waste (since it cannot be released to your customer until it’s finished). Also, someone has taken the time to enter the information, and someone must look at it to resolve the bug. Meanwhile, someone else might stumble across the same bug. All this takes time. Lean software development says you should solve the bugs right now. In order to make changes to solve the bug right away, you need tests to make sure you don’t break something else. Good quality software starts with rigorous and automated tests on acceptance and unit level (see my previous posts: test-driven development done right and edge-to-edge unit tests). Reducing the extra work from bugs will free up resources to implement new features.

Lean software development is about continuous improvements. Every day, we should ask questions like: How do I make my customer happier? How do I reduce waste in order to release faster? But who should answer these questions and implement the improvements? Lean says it’s the people closest to the problems. Inevitably, in software development, it will be the programmers! I think this is the most appealing part of the contents of the book.

This is a fantastic book and I am sure I will read it again pretty soon. Read it!

Test-Driven Development Done Right

A couple of years ago, I had at least two misconceptions about Test-Driven Development: (1) you should write all your tests up-front and (2) after verifying that a test case can fail, you should make it pass right away. To better understand TDD, I got a copy of the book “Growing Object-Oriented Software Guided by Tests” by Freeman and Pryce (now one of my absolute favorites). Although the book does a great job explaining the concepts, it took me ten chapters to admit I had been wrong. Let’s never do that again. :)

Let me walk you through my misconceptions so that you don’t have to repeat my mistakes:

Misconception 1: write all tests up-front. Thinking about potential test cases up-front is not a bad thing. It will exercise your imagination and with some luck, many of them will still be applicable after the code is written. But don’t waste energy trying to compile an exhaustive list of tests. At least for me, this approach didn’t work since my imagination appears too limited to come anywhere near the final list. But foremost, I would like to get going writing some test code!

You are better off writing a few happy-path test cases. Filling in the test code will get you started working on the user interface of your classes. When the test code starts acting as a “user” of your interface, it will be obvious to you whether the API is okay or awkward to work with. The tests will drive you to improve your user interface. Creating the user interface will invariably make you think about error cases and how the API can be abused or misunderstood. You will come up with more test cases, and implement these. With some effort, but surprisingly little so, you will grow your test suite.

Misconception 2: fail the test, then make it pass right away. When you have written your test code, filled in the production code to get it all to compile and seen the test fail, it is very tempting to just fix things. Make the changes to have the test pass. Actually, you can do that. But there are at least two reasons not to.

First, I strongly prefer an incremental approach. I fix only the problem reported by the test! If the test says “null pointer exception”, I will fix it. Running the test again, you will get another failure and fix that. This is the convenient/lazy approach, you just let the test drive you. Also, it will result in minimal increments, which is very helpful if another test case would break while changing the production code.

Second, fixing the failed test case right away will throw away a lot of information in the process. When the test fail, it will provide you with valuable information on what went wrong. If you cannot immediately understand what the problem is, maybe you should improve your test or production code? For example, if you get a “null pointer exception”, maybe error handling or an assert earlier in the production code could make sure your program never gets into that kind of a corrupted state. Alternatively, you could improve your test code with all kinds of helpful diagnostics. The idea is, if it takes time for you to understand what went wrong today, imagine how much time will be wasted when the same test case fails in six months. “Growing Object-Oriented Software Guided by Tests” says you should “let the tests talk to you”. There you go, you are test-driven.