Abstraction vs Compression

In our daily communication you might hear things like “higher level of abstraction”, but what is abstraction? And how does it relate to compression?

Wikipedia says this about Abstraction: “Abstractions may be formed by reducing the information content of a concept or an observable phenomenon, typically to retain only information which is relevant for a particular purpose.” For example, a ball is an abstraction of a football and other types of balls. Abstraction in software development is about removing details from something concrete; there’s a loss of information. For example, a class for sending packets over a TCP socket is a concrete concept. An abstraction could be a Network interface with a send function. In the abstraction, we remove the details of exactly how the data is sent.

Compression, on the other hand, is about hiding information. Although not visible, the information is there and can be retrieved if necessary (like unzipping compressed content). One example is procedural programming: Reading a function should give you a good picture of what the function does. If you need more information, you can always go into the called functions for details. No information is lost.

Using an interface in object-oriented programming introduces an abstraction. In runtime, in the general case, you cannot know which class implements an interface. You lose information. This can make your object-oriented code hard to understand, review etc. Some may argue that this is not a problem: If your interfaces are clear, it does not matter who implements it. It should be sufficient to know that the subclass carries out the work according to the specification of the interface (honoring the Liskov Substitution Principle). Still, the information loss can be a challenge.

Other uses of the word “abstraction” in software development may appear when people mention things like “programming at a higher level of abstraction”. The Network interface from above is a good example. It is on a higher level of abstraction than the more low-level TCP implementation (which itself hides raw socket operations). What if we had a Network class with a send function, and it implementing the TCP socket sending. Is this an abstraction? The public functions of Network hides the details of TCP packet sending etc. By going into the Network class, we can retrieve all details of exactly how packets are sent. Information is hidden, but not lost. If no information is lost, this is rather “programming at a higher level of compression”. :)

After putting you through this, I must say that in our daily communication, we (myself included) don’t pay much attention to the distinction between abstraction and compression: “abstraction” normally means any kind of information hiding or removal. But it’s useful to know the difference since it affects the understanding and readability of your code.

Design Principles by Example: Talk to an Interface or an Abstraction?

What is the relation between design principles “Talk to an interface, not an implementation” and “Talk to an abstraction, not a concrete”? When you apply them, you want to achieve different goals.

Two important design principles for writing good software are

  1. “Talk to an interface, not an implementation” and
  2. “Talk to an abstraction, not a concrete”.

Admittedly, they sound very much alike, so what is the difference between them?

communications towerAssume you are writing a client implementation that needs to communicate with a server somewhere. You have chosen to use a web socket for sending messages over the wire. To that end, you will use a class ClientWebSocketSenderImplementation. Now, the “Talk to interface, not an implementation” design principle suggests that talking to the web socket implementation class directly is inappropriate. Instead, you should talk to an interface ClientWebSocketSender.

Following the first design principle have several upsides. First, it will make your code easier to test. In this case, using the web socket implementation directly would use the network. That would make your unit tests slow and unreliable and might require a complicated setup phase. Second, talking to the web socket implementation directly would couple your client code to that specific implementation. If needed, changing web socket implementations would be difficult. Also, your code would not be reusable without shipping the web socket implementation.

We have chosen to send messages using a web socket. But there is really no need for our client application to know how messages are sent over the network. The second design principle says “Talk to an abstraction, not a concrete”. The concrete here is a web socket. A suitable abstraction in this context could be the ability to send messages over the web without specifying how. So we introduce an interface ClientWebSender. Depending on the application, we could take it one step further. It might make sense to abstract away the fact that we’re sending messages over the internet (for example, it could be over an IPC channel). We would end up with an interface ClientSender.

The second design principle will make your application more resilient to change. Without the abstraction, the web socket details might propagate throughout your code. For example, functions, arguments and return types could be specific to web sockets. If you would like to change your application to support message sending over e.g. HTTP or your own proprietary protocol over TCP, you would have to chase down all references to web sockets.

Last, the “Talk to an abstraction, not a concrete” does not require us to talk to an interface. You might have a class that represents the “abstraction” part and hides the “concrete” part (e.g. by delegating to a web socket implementation). So our two design principles serve different purposes and does not necessarily overlap. That said, they work very well in combination to write decoupled, testable and change resilient software.