The Art of Abstraction

 

The task as developer is to write code that abstracts a real world concern into an atomic concept that can be understood and processed by a computer. But what else do we abstract and how can it help us to write better code. I think abstraction can be applied to many things as long as we use it in a way that improves what we do and does not increase complexity too much.

First things first, in this post I give my opinion and toughts on how I tackle the task of writing maintainable code. There are many ways to tackle the task of writing beautiful code and like every developer I keep learning and this contains a snapshot of my way to look at things.

Definition of Abstraction

We as developers take real world problems and try to solve them, using algorithms and procedures. For a computer to be able to solve any problem, we usualy break them apart. A blog for example can be defined as a way to deliver multiple articles written by someone (the author). However to implement this, we split the blog in simple atomic tasks. For example: create post, edit post, delete post, list posts. We now increased the number of problems we want to solve, but made them individualy easier to solve. In basic computer science academics, this is also called “divide and conquer”. We also abstacted multiple concepts out of the big concept “blog”. In this case, abstraction made our problem easier to solve.

The definition of abstraction is basicly to reduce the complexity of something. We seek for the essence of what we abstract. Artists create abstract images by reducing details in a way that the essence of what is shown still exists. A mathematician would seek for the essence of a mathematical concept, like breaking down a pattern into a simplistic formular. We developers try to break down complicated problems into their essential parts to get on each partial problem one by one.

Lets remember the blog for a second time. We have multiple actions we need to do in a blog, but implementing them can also be a complex problem. Looking at the edit step we need to make sure it takes the wanted blogpost, allows us to manipulate it and save it back to our storage. And maybe while we get the post from the storage, we need to make sure the user is permitted to edit this post. We break our problem of editing down to the most basic steps like sending a request to our storage or verifying that our user has the permission to edit the post. If we imagine a this in a tree, we can see a structure of problems and their child problems.

untitled-diagram

Don’t Repeat Yourself

We now have a broken down version of our blog in abstract partial problems. One thing that we see in the graphic above is that some problems appear multiple times. Lets see the blog as a class and create, edit, delete and list as methods. We could solve each of the partial problems in those methods, but we would have code repetition.

Clean code rules teach the DRY (Don’t Repeat Yourself) principle. It simply encurages you to write things that do the same thing only once. Every developer should follow it and write reusable methods as much as he can. One benefit of this, taking aside that we dev’s are lazy, it allows for changes at only one position in the code. Most of the time this also results in solving a bug only once.

To apply the DRY principle, you need to abstract common tasks. According to our definition, you should find the essence of a problem solution and solve it once in a method. In this example our repetition is obvious, but sometimes you come across concepts that seem different but can be further broken down in equal solutions. For me my mind goes into alert state, when I see a method that does a couple of things but only differs in some lines from other methods. For example a different regular expression. I then extract that part and try to find another way to pass it to a method that can apply the difference at the rest of the code.

Lets focus on the create and edit method here and isolate the repeating parts into a new set of methods. In reality I would even put them into a new class and generalize them so I could use them later in other parts of my code.

abstract-blog-edit-create

Single Responsibility Principle

We have now made each problem unique, but we should feel that there is still some kind of similarity between some of the problems we solve. We already made a new class for validation and put methods in there. But looking at the remaining problems in our blog class, we can see that they all have to do with getting or saving information.

The SRP (Single Responsibility Principle) is a rule that encurages us to think about abstracting out general problem solutions. The exact definition of this principle states that a responsibility is a reason to change that class. The SRP therefore defines that for each class there should only one reason to change. We have already extracted the responsibility of validating something. If we want to change the way we validate something we edit only the Validator class. However we could also say the reason to change the Validator class is to change the way we check for security or the way we check for valid input. We have to define what a reasonable responsibility is in each case. Our target is to find an abstract bigger concern and putting it in its own class.

Thinking about the image above, we can see that we have the responsibility of communicating with the storage. We could imagine that next to the blog, there should be a user management that provides the rights we want to check. In this case it makes sense to abstract the storage handling into a new class BlogRepository. We now have abstracted almost all our single problems in their own classes with their own general solutions. These classes are ready to use for the next module we want to create and in case we created a bug somewhere, we can fix them in a single file in our application.

abstract-blog-edit-create2

Giving the methods a KISS

We have now created an architecture that abstracted our bigger problems out of our sight to places where they belong. But what is still missing is our implementation. We simply arranged everything our module needs but each problem could still take many lines of code to implement.

Another famous principle is KISS (Keep It Simple, Stupid). It states that everything we do, should be made as simple as possible. Therefore it should also be easy to understand for other developers. To reduce the complexity, we can simply abstract the atomic parts and divide the problem in multiple more simple problems.

Lets take the “save” part of our BlogRepository as example. We need to think about what we need to save a freshly created and an edited blog post in the first place.
We also have to take into account that a new blog post should use an INSERT query while a edited post should use an UPDATE query. We also assume that our post is an array for each field but we have somehow no control over the order. We want to make sure to put the right data at the right place.

public function save($post)
{
    $servername = "localhost";
    $username = "username";
    $password = "password";
    $dbname = "testDatabase";

    $conn = new mysqli($servername, $username, $password, $dbname);
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    }

    if (isset($post['id'])) {
        $sql = "INSERT INTO blogpost (autor, title, text)
                VALUES ('" . $post['autor'] . "', '" . $post['title'] . "', '" . $post['text'] . "')";
    } else {
        $sql = "UPDATE blogpost SET autor='" . $post['autor'] . "', title='" . $post['title'] . "', '" . $post['text'] . "' WHERE id=" . $post['id'];
    }
    if ($conn->query($sql) !== TRUE) {
        throw new Exception("Error: " . $sql . "
" . $conn->error);
    }
    $conn->close();
    return true;
}

Taking aside that this is in fact terrible code, it is a naive approach. We do lots of stuff here, first we create a database connection, then we test the connection. After that we determine if our post has an id. If it has no id, we insert a new post, else we update the post. We then execute the query and close the connection and return true. Everyone of you should see that this is not only badly written, it is also too much inside one method.

Lets abstract some concepts here while keeping the code and its logic the same. The first task is establishing the connection. We could simply write a method getDatabaseConnection to extract this. Next we create the query strings depending on wether we want to simply insert a new post or update an existing one. We want to keep the logic the same, but the query creation can be extracted. However thinking about the code we want to not only create them but execute them. We extract the execute part into an method “execute” and the querys to update and insert which each call the execute method. And while we are on it we get rid of the return and define the method as void, since we throw an error if something goes wrong.

public function save($post)
{
    $this->getDatabaseConnection();

    if (isset($post['id'])) {
        $this->update($post);
    } else {
        $this->insert($post);
    }
}

protected function getDatabaseConnection(
    $servername = 'servername',
    $username = 'username',
    $password = 'password',
    $dbname = 'testDatabase')
{
    $conn = new mysqli($servername, $username, $password, $dbname);
    if ($conn->connect_error) {
        throw new Exception("Connection failed: " . $conn->connect_error);
    }
}

protected function insert($post)
{
    $sql = "INSERT INTO blogpost (autor, title, text)
            VALUES ('" . $post['autor'] . "', '" . $post['title'] . "', '" . $post['text'] . "')";
    $this->execute($sql);
}

protected function update($post)
{
    $sql = "UPDATE blogpost SET autor='" . $post['autor'] . "', title='" . $post['title'] . "', '" . $post['text'] . "' WHERE id=" . $post['id'];
    $this->execute($sql);
}

protected function execute($sql)
{
    if ($conn->query($sql) !== TRUE) {
        throw new Exception("Error: " . $sql . "
;" . $conn->error);
    }
    $conn->close();
}

Now we have split our save method into 5 methods while not noticable altering the logic behind it. While this is still not the way, something like this should be done, we can now read the code much more from top to bottom. We start with our only public method save and each step is self explainatory. We use proper naming to ensure our code reads like text in a book. We abstracted the complexity of one method in its atomic parts and no method contains more than 10 lines of code.

Conclusion

We can abstract code in various different ways. Looking at the result of our development effort, we start to see what we do from different angles. Each angle offers new way to optimize the code in one way or another. However the line between optimizing the code to be usefull and overengineering it, till it becomes a golden mule is a narrow pathway. Everything we optimize should first start with our head, thinking about what the essence of the problem is, we have with the code right now. Even the tought process has to be abstracted to its very essence.

I simply displayed here, what is going on when I try to optimize code and what for me results in something that I consider reasonable code. Lately I was thinking about the way how I sometimes overdo this whole abstraction process sometimes in a goldplating frenzy. This is an antipattern and reflecting on that could be considered vital learning. You should always reflect on what you do, and keep your mind open to ideas. Dont swallow everything other people say, simply adapt it and improve it the way it works for you.

Share this Post

Florian Tatzel

I have studied computer science with weight on media and currently work as web developer at a TYPO3 agency in germany. I like OpenSource projects and contributing where I can. I also like to share knowledge and discussing hot topics in the bussiness, like Workflow trough Virtualisation and Code Quality.