Why do software developers focus so much on the inside of the system when what we really want is to correctly implement the system as seen from the outside? Is it possible to first write the code for the external behavior, and then tweak the inside? Maybe, but we might need to rethink.
I’ve developed reactive system with asynchronous input for many years, such as telephony systems. Over time, I have become increasingly puzzled by the way we develop these systems. A few years ago, I realized what was bothering me.
In the development of a reasonably complex piece of software, there are at least three roles involved: requirements engineering, software engineering and quality engineering. The requirements people look at the system from the outside. They view the system as a black box and define the behavior of the system by its incoming and outgoing signals (as well as non-functional requirements). Testers also look at the system from the outside. They inject signals and verify expected outgoing signals (as well as non-functional requirements). The software developer wants to implement a system that corresponds to the requirements. Thus, it would be natural for the developer to think of the system as a black box (ignore the internals!). He would start out by implementing the observable behavior, perhaps by describing the incoming and outgoing signals of the system as a state machine. For most non-trivial systems, this is not what we do.
Instead, the programmer focus on what is on the inside of the system. Implicitly, and perhaps scattered over hundreds of thousands of lines of code, various sub-routines define the logic and outgoing signals of our system. The externally observable behavior is a side-effect of these sub-routines. In any non-trivial system, it is almost impossible to verify the correctness by visual inspection. We make our best effort to test our software to ensure the correct behavior, with the cost that it incurs. Focusing on the inside of the system is only natural, since many factors affect the source code of our system (e.g. non-functional requirement such as performance, scalability, robustness etc.). Natural or not, it results in systems difficult to implement correctly. Thus, the question here is: can we do anything about it? Would it be possible to explicitly describe the externally observable behavior in a single place in our code while still being able to satisfy non-functional requirements?
What would this code look like? It would describe the logic of the system: incoming and outgoing signals and the necessary control flow along with some house-keeping data. Let’s call it the Black Box description of the system. It would not contain any implementation details (threading, database storage, performance tweaks, etc.) since implementation details are not necessary to describe the external functional behavior of the system. Everything would be described in the domain language. Let’s take an example: Assume we are implementing an ATM machine implementing this behavior. It would have incoming and outgoing signals towards the user interface (user input and text on the display), the bank’s server (requests and responses) and the ATM machine itself (card inserted, eject card etc.). The number of failed PIN attempts would be stored in a variable (house-keeping data). The “Too many invalid PINs” transition (here) would read this variable.
We write the Black Box code so that it is executable in isolation. This means the Black Box code must only talk to abstractions, and never directly to code that contain implementation details of the system (networking, platform specifics and optimizations etc.). Executable in isolation also means it will be unit testable in isolation. By testing the behavior of the Black Box code, we can verify the logic of the system. Obviously, other kinds of tests are required to verify the full implementation (state stored in databases, networking behavior etc., not to mention non-functional requirements). An executable Black Box would be a tremendous advantage. We would not need an implementation to be able to try out our system. Testers could start verifying and integration tests could begin very early on. We could also do rapid prototyping.
Many systems can be implemented simply by a state machine like the one in the ATM example. But large systems tend to be much more complex than that. First of all, we need to be able to address non-functional requirements. There are also some implementation issues to consider. I wouldn’t be able to list all challenges, but to get a feeling for how some of the issues can be addressed, let’s mention a couple of them:
- Asynchronicity: For example, we are writing a system that requires authentications. As an implementation detail, we may choose to query another machine over the network. Thus, asynchronous results are inevitable. We don’t want the Black Box to reveal this (since our authentication procedure does not concern the end user). In our implementation, we could adapt our Black Box state machine by introducing a sub-state machine where we wait for the response from the other machine. This require us to be able to extend or substitute a state with a sub-state machine.
- Performance: What if multiple threads are involved in the execution of the Black Box? We might have to divide the state machine so that parts of it is executed in one thread and parts of it in another thread (or even in different processes or machines). Here, too, we need asynchronous communication between threads, much like the above. We also need the ability to execute only parts of the state machine.
- State: Assume the flow of the state machine does not depend only on input signals but also on some other piece of data. For example, it could be a database query that answers whether a user is registered or not. Somewhere in the Black Box description we might call a function isUserRegistered(). In the implementation, we will use a real database. When executing the Black Box during testing, we let isUserRegistered() return pre-determined values for different test cases, very much like a mock object.
To implement larger systems, we would combine the Black Boxes of our sub-systems into a larger whole. We would build a hierarchy of Black Boxes. Black Boxes would communicate with each other through incoming and outgoing signals, which translates into asynchronous signals or synchronous function calls, whichever is most appropriate. The combined system would also be unit testable. Unit testing the combined system would exercise the sub-systems, since the combined system’s behavior relies on the behavior of the sub-systems.
The idea of a Black Box description is definitively not new. There are tools for model-driven design that come very close to what is described above. For example, in Mentor Graphic’s BridgePoint, you can describe your Black Box behavior as a state machine and generate customized code by using what is called a model compiler. I’ve seen several successful projects built on BridgePoint, so the concept seems viable. But most programmers feel most comfortable when the code is at the center of things. You want full control of your code. This may also be a contributing reason why model-driven design has not taken off. So, the question here is really what can be done without advanced tools.
I developed a framework for Black Box Programming a few years ago. It addressed the challenges described above (e.g. replacing a state or transition with a sub-state machine, execution of parts of a state machine, unit testing) and had a couple of nice features (e.g. random walk in the state machine, execution of the unit tests towards the real system). So I think it is possible to develop software like this. The question is just if it is practical. I think the best way to get a feeling for Black Box Programming is to try it out. If you have a project in mind that can be open-sourced, get in touch and we’ll try it out together!