Why is Clean Code important?
Code which is not well written is hard to read and understand. It’s a mess. Working with a mess is hard and prevents programmers from making changes easily. If not tidied up it will eventually deteriorate into unsupportable goop and you can’t sell goop. This is why Clean Code is important.
Robert C. Martin tries to answer some questions with his book Clean Code. Why do software projects become harder and harder to change and modify? Why does our productivity drop as the project goes from release to release? Robert explains that it happens because programmers write bad code. There is no one else to blame but the ourselves. If we write a mess we can’t continue to go fast like we were able to at the start. By reading his book, I hope to learn how to write clean code which in turn should allow me to be more professional when I write code.
What is Clean Code?
“Clean code is code that looks like it was written by somebody who cares.”
– Michael Feathers
“You know you are reading clean code when each routine that you read is pretty much what you expected.”
– Ward Cunningham
Some Practical Refactoring Tips
“Classes hide in long functions.”
Robert means that when you look at a long function which uses lots of variables in various different (functional) ways then you’re probably looking at a piece of code which should be a class. The suggested approach is to take that code and move it into its own class before further refactorings. The original function still exists but instead it just creates an instance of the new class and calls “invoke()”. The new invoke() method simply contains the moved code block. This is part of the refactoring technique – the invoke() method can be renamed later once the purpose of the class is well-defined to something which will be clear to any reader of the code. So after this step, you have a block of code in the invoke() method of a new class. The next step in the refactor is to move the variables used in this function out and make them class members. This can be done one variable at a time. At the same time, the large invoke() method can be broken up into methods which do one thing (next bullet point). I think this sounds like a great tip. I’ve often wondered where to begin when refactoring code. At work I have a class which is several thousand lines long and has lots of really long methods. It’s calling been calling out to be refactored for some time! I intend to use this practical piece of advice on that class.
“Lots and lots of little well-named functions will save the team lots of time in the long run.”
Well-named functions act as littles sign-posts within your code and make your code much more navigable to other developers. Its rare that you work on a project on your own – and even if you do those well-named functions will guide your future self through the code when you come back to it and you inevitably will.
“Functions should only do one thing”
However how do you know when a function does one thing? He explains that you should “extract until you drop”. This means that if you can extract functionality from a function you should. Otherwise that function is by definition doing more than one thing.
Show me an example!
I’ve watched him refactor long functions in a number of screencasts from his cleancoders.com series of videos. He rewrites code which has lots of indentation into short functions with well written names. He uses the IntelliJ IDE’s refactoring features heavily in his screencasts. These are built specifically to automate moving blocks of code around in useful ways, such as extracting a block of code from one function into a new function, or even a new class entirely if needed. I’m interested to find out if IntelliJ can be used in that way for C++, perhaps the Visual Studio plugin I use at work Visual Assist can be used in this way. However the key is knowing the techniques of refactoring.
He declares that the functions should be limited to 4 and at most 5 lines long. This he says helps to increase the readability of the code because you can’t fit heavily indented code into 4 lines of code!
At the end of the screencasts the code reads a lot better, by this I mean that there is minimal indenting, usually there is no need for curly braces around the contents of an if-block or a for loop. Switch statements are removed and replaced with polymorphic classes using the Template pattern.
Before refactoring code, Robert always has a set of automated regression tests. Its well known why he does this – to ensure the refactor he is about to perform doesn’t break the functionality of the existing code! I found one example particularity interesting. He was refactoring a single function which calculated prime numbers and outputted these numbers in a neatly formatted output. There were no existing unit tests for this code. So his approach was to run the function and capture a “golden” output for a set of inputs. He then wrote a test which called the function and checked that it produced this golden output. Another name for this test is a “Characterisation Test“. If it did then the test passed. It’s a novel approach to developing a suite of tests on-the-fly for a given piece of code you wish to refactor.
However these auto generated tests are rigid and fragile. Sure they tests the code, and will catch when you break something – but the tests were tied to how the data was outputted and formatted. In other words, if you wanted to change the formatting of the output your tests would break. In my view that means that once Robert was happy with the refactor of the Prime Number generator code he should go back and refactor his tests to make them clean also!