Replay: Recreate Every Single Bug

Do you ever spend lots of time trying to understand and recreate a bug scenario? Is there a bullet-proof way to reproduce every single bug? By logging the right information, it should be possible.

Reproducing bugs in complex systems is often hard. Even if the use case that caused the bug is explained in detail, this may not help much. Other factors, such as database state, configuration, timers, network activity and randomness, affect code execution. Without exact knowledge of these factors, you cannot determine which path was taken through your code. Often, debugging information is written to a trace log file to mitigate the problem. However, this is intrusive and clutters the code. Furthermore, by the time a bug is found in a live system, it is too late to add more tracing to the code. We need something else.

As said above, in order to reproduce a bug, we need to know exactly which path was taken through our code. In order to do that, we need to know exactly what decision is taken for every possible branch (if, for, switch etc.). In certain circumstances, this might actually be possible.

Recording the Input

First, let’s assume we have a module without concurrent data access (either single-threaded, by explicit synchronization or by design). Second, vhs cassettewe identify incoming events to the system, such as calls on a public API, GUI user interaction or network packages. They are the entry points to your code. Third, we identify all other external sources of data that might affect the execution. For example, calling a readFromDatabase function will return some data. The use of this data will affect the code execution. Thus, we treat all reads from the database as input to our module. The same goes for configuration, random values and all the other factors mentioned earlier. Combined, we think of the incoming events and the data from external functions as the input to our module.

Fourth, after identifying all module input, we introduce a mechanism to eavesdrop on the incoming data. For each event (e.g. callback userClickedButton or call to a public API function), we store the function arguments to a file. Let’s call this file the interaction log. Similarly, for each external function call (such as a database read), we store the return value or exception thrown.

Replaying From File

In order to reproduce the scenario of the bug, we replay the events from the file by injecting them as function calls into the module. The module will execute its code until it reaches an external function call such as readFromDatabase. Instead of calling the function, we retrieve the return value (or exception) from the file and return (or throw) that instead.

Now, there are a couple of challenges when implementing this approach. First, we’ve restricted ourselves to code that never accesses data concurrently. There’s probably nothing we can do about this. Anyway, from my experience, it is better to organize the software to avoid these concurrency problems since the alternative is just too painful (see post on concurrency).

Second, we want the calls from our module to an external function (e.g. readFromDatabase) to either call a real function (e.g. in the database implementation) or to replay from file. Obviously, we don’t want our module to be aware whether we are replaying or not. In object-orientation, we can achieve this through sub-classing an interface. The module talks to the interface, and behind the interface, there’s either a real entity (the database) or something that replays from file. Thus, all calls from your code to external functions must go through an interface, and never to a concrete implementation directly.

Third, external function calls (such as a database read) have arguments. What if the external function decides to manipulate one of the arguments? For example, it could call a setter method or change a public member. We would have a real problem. The replayed execution would not call the setter method, and the system state would not be equivalent to the bug scenario. Now, to me, manipulating the arguments of a function is poor style. Doing it in an API (for e.g. a database system) is even worse. So hopefully, situations like these are rare. But when they do arise, we will have to restructure our code slightly if we want to be able to replay.

Implementing the Replay Functionality

Java has a very handy mechanism to support the implementation of the replay functionality: reflection. We might have a large number of interfaces through which we call external functions. Nevertheless, using reflection, we can create a single wrapper class that can handle all interfaces. Lets denote it LoggingWrapper. We would create an instance of LoggingWrapper and supply it with an instance of the real class (e.g. the database implementation). We would give the LoggingWrapper object to our module, and our module would think it is talking to the real entity (the database). When our module calls an external function (e.g. readFromDatabase), the LoggingWrapper would forward the function call to the real entity (database) and then log the return value (or exception thrown) to file. If we don’t want to log to file, we would not create a LoggingWrapper. Thus, we would not suffer any performance penalty.

When we want to replay from file, we create a ReplayWrapper and give that to our module. From some other class (e.g. EventReplayer), we would read an event and its arguments (e.g. “the user clicked button X”) from the file and call the corresponding function on the module. When an external function is called (e.g. readFromDatabase), the ReplayWrapper would read a return value (or exception) from file and return it (or throw the exception). As an extension, we could also verify the integrity of the interaction log while replaying. The downside is that this require some extra information to be recorded in the log (such as the full name and argument values of each function call). An integrity check would be able to detect a number of things, such as if the arguments to an external function call differ from when the log was written.

You could imagine implementing the replay functionality per module. But a replay implementation seems complex enough to be non-trivial. It would be useful with a general purpose Replay framework. For fun, I have started sketching on one. Time will tell if/when it will be in good enough shape to be released to the public. All design/code/idea contributions are welcome. :)

JUnit Max Takes Test-Driven Development to the Next Level

Automatic compilation as you type is useful, but can we take it further? JUnit Max automates the execution of your unit tests.

In the good old days, you would build your software by running make from the command line. The output would tell you where the errors were, and it would be up to you to find the file and line to fix the error. Nowadays, most IDEs are somewhat better. Normally, you press a key to build. The IDE then helps you navigate the errors by taking you to the file and line. This still involves a context switch: you will have to stop writing code in order to press a key, and subsequently to look at the error messages. Doing this 50 times per day, and the disruption of the context switches becomes noticeable.

automaticModern IDEs offer another improvement. It will run the compiler automatically in the background for you. For example, Eclipse will compile the code while you’re typing and show the errors underlined in red or as a red icon on files that fails to compile. This removes much of the context switch mentioned above. This, combined with the immediate feedback, leads to less interruptions. This is an improvement.

One role of your compiler is actually that of a test tool. In a sense, it tests your code for a set of very specific problems, such as type mismatches, undefined functions etc. It might even warn you about some null pointer issues that would normally surface during execution. The compiler knows a lot about you code, and this is is also why it makes sense turn on “treat warnings as errors”. Without it, warnings will go unnoticed, and you will never have a clean build.

After the compiler tests, the next level of tests are the unit tests. Most IDEs allow you to run your unit tests by the press of a button. Again, you will have the task switch described above. Furthermore, the results are often not as integrated into the IDE as the compiler output. But if the code compiles automatically when we write our code, why can’t the unit tests run too, nice and clean and fully integrated?

This is probably the question Kent Beck asked himself when he came up with the idea of JUnit Max. JUnit Max is an Eclipse plugin designed to help test-driven development. When you save a file, JUnit Max executes the unit tests for you. Results are shown as red icons on failed tests. Since big projects can have thousands of unit tests, JUnit Max contains some extra logic to order the execution of the tests. First of all, it will run the fastest of your unit tests first. This will provide you with fast feedback. Second, it will run those of your tests that failed recently. Those tests are more likely to fail than those failing a long time ago. Using JUnit Max for test-driven development is awesome. Just save your files and it will provide you with instant feedback. I use it when I do “acceptance unit test-driven development” (see post) and it really speeds things up.

So, what is the next level of tests after unit tests? What is the next level of tests that would be beneficial to execute when you press “save”? How about running some functional acceptance tests automatically? It would probably require a full and successful build. The same prioritization between tests as above would be helpful: run the fast and the shaky tests first. Running acceptance tests automatically could work, and could be useful.

What’s next? Load tests? Full integration tests of a system? And after that, deploy automatically? Imagine, you change a single line of code, save, and minutes later, the changes are deployed live. Now that is what I call continuous deployment! Ok, I agree, deploying without committing to a source repository is probably not what you want. :) (I guess for deployment, doing it 50 times per day is cool enough.)

Black Box Programming

Why do software developers focus so much on the inside of the system when what we really want is to correctly implement the system as seen from the outside? Is it possible to first write the code for the external behavior, and then tweak the inside? Maybe, but we might need to rethink.

I’ve developed reactive system with asynchronous input for many years, such as telephony systems. Over time, I have become increasingly puzzled by the way we develop these systems. A few years ago, I realized what was bothering me.

In the development of a reasonably complex piece of software, there are at least three roles involved: requirements engineering, software engineering and quality engineering. The requirements people look at the system from the outside. They view the system as a black box and define the behavior of Black Boxesthe system by its incoming and outgoing signals (as well as non-functional requirements). Testers also look at the system from the outside. They inject signals and verify expected outgoing signals (as well as non-functional requirements). The software developer wants to implement a system that corresponds to the requirements. Thus, it would be natural for the developer to think of the system as a black box (ignore the internals!). He would start out by implementing the observable behavior, perhaps by describing the incoming and outgoing signals of the system as a state machine. For most non-trivial systems, this is not what we do.

Instead, the programmer focus on what is on the inside of the system. Implicitly, and perhaps scattered over hundreds of thousands of lines of code, various sub-routines define the logic and outgoing signals of our system. The externally observable behavior is a side-effect of these sub-routines. In any non-trivial system, it is almost impossible to verify the correctness by visual inspection. We make our best effort to test our software to ensure the correct behavior, with the cost that it incurs. Focusing on the inside of the system is only natural, since many factors affect the source code of our system (e.g. non-functional requirement such as performance, scalability, robustness etc.). Natural or not, it results in systems difficult to implement correctly. Thus, the question here is: can we do anything about it? Would it be possible to explicitly describe the externally observable behavior in a single place in our code while still being able to satisfy non-functional requirements?

What would this code look like? It would describe the logic of the system: incoming and outgoing signals and the necessary control flow along with some house-keeping data. Let’s call it the Black Box description of the system. It would not contain any implementation details (threading, database storage, performance tweaks, etc.) since implementation details are not necessary to describe the external functional behavior of the system. Everything would be described in the domain language. Let’s take an example: Assume we are implementing an ATM machine implementingATM this behavior. It would have incoming and outgoing signals towards the user interface (user input and text on the display), the bank’s server (requests and responses) and the ATM machine itself (card inserted, eject card etc.). The number of failed PIN attempts would be stored in a variable (house-keeping data). The “Too many invalid PINs” transition (here) would read this variable.

We write the Black Box code so that it is executable in isolation. This means the Black Box code must only talk to abstractions, and never directly to code that contain implementation details of the system (networking, platform specifics and optimizations etc.). Executable in isolation also means it will be unit testable in isolation. By testing the behavior of the Black Box code, we can verify the logic of the system. Obviously, other kinds of tests are required to verify the full implementation (state stored in databases, networking behavior etc., not to mention non-functional requirements). An executable Black Box would be a tremendous advantage. We would not need an implementation to be able to try out our system. Testers could start verifying and integration tests could begin very early on. We could also do rapid prototyping.

Many systems can be implemented simply by a state machine like the one in the ATM example. But large systems tend to be much more complex than that. First of all, we need to be able to address non-functional requirements. There are also some implementation issues to consider. I wouldn’t be able to list all challenges, but to get a feeling for how some of the issues can be addressed, let’s mention a couple of them:

  • Asynchronicity: For example, we are writing a system that requires authentications. As an implementation detail, we may choose to query another machine over the network. Thus, asynchronous results are inevitable. We don’t want the Black Box to reveal this (since our authentication procedure does not concern the end user). In our implementation, we could adapt our Black Box state machine by introducing a sub-state machine where we wait for the response from the other machine. This require us to be able to extend or substitute a state with a sub-state machine.
  • Performance: What if multiple threads are involved in the execution of the Black Box? We might have to divide the state machine so that parts of it is executed in one thread and parts of it in another thread (or even in different processes or machines). Here, too, we need asynchronous communication between threads, much like the above. We also need the ability to execute only parts of the state machine.
  • State: Assume the flow of the state machine does not depend only on input signals but also on some other piece of data. For example, it could be a database query that answers whether a user is registered or not. Somewhere in the Black Box description we might call a function isUserRegistered(). In the implementation, we will use a real database. When executing the Black Box during testing, we let isUserRegistered() return pre-determined values for different test cases, very much like a mock object.

Flower SpiralTo implement larger systems, we would combine the Black Boxes of our sub-systems into a larger whole. We would build a hierarchy of Black Boxes. Black Boxes would communicate with each other through incoming and outgoing signals, which translates into asynchronous signals or synchronous function calls, whichever is most appropriate. The combined system would also be unit testable. Unit testing the combined system would exercise the sub-systems, since the combined system’s behavior relies on the behavior of the sub-systems.

The idea of a Black Box description is definitively not new. There are tools for model-driven design that come very close to what is described above. For example, in Mentor Graphic’s BridgePoint, you can describe your Black Box behavior as a state machine and generate customized code by using what is called a model compiler. I’ve seen several successful projects built on BridgePoint, so the concept seems viable. But most programmers feel most comfortable when the code is at the center of things. You want full control of your code. This may also be a contributing reason why model-driven design has not taken off. So, the question here is really what can be done without advanced tools.

I developed a framework for Black Box Programming a few years ago. It addressed the challenges described above (e.g. replacing a state or transition with a sub-state machine, execution of parts of a state machine, unit testing) and had a couple of nice features (e.g. random walk in the state machine, execution of the unit tests towards the real system). So I think it is possible to develop software like this. The question is just if it is practical. I think the best way to get a feeling for Black Box Programming is to try it out. If you have a project in mind that can be open-sourced, get in touch and we’ll try it out together!