It was hanging on my self quite for a while. And finally, over the May Bank holidays, I got enough time for focus reading. As I already shared in my twitter – it was worth every minute spent reading it.
What did I like about it?
First of all, is how the author looked for prioritising what tech debt should be paid first. It is not even about the exact approach that he took to identify what places are hot spots (one more exciting concept from the book), but rather, that he did take some ways based on the data to find where the effort to improve code or its structure can be the most helpful. As it is rightly pointed out at the start of the book if to work on the tech debt without initial prioritisation, it is possible to end up in need to dedicate 4000 years of eng hours to fix it in all code.
So, what are those approaches:
1. Check what files recently have been modified with most of the contributors
This can be a clear sign that file, that has a contribution from too many folks at the same time is containing too much logic that is coming from different places. And there are various reasons why paying that tech debt can be beneficial to the team – from the most apparent – lower amount of merge conflicts to more hidden ones like that it would be easier to have a mental model about this piece of code, as it wouldn’t be changing so often.
2. Check what files are the longest ones
One of the way to indicate complexity f the file author is proposing to measure its length. Here as well as at many other approaches Adam emphasise, that data is only data, and we as engineers should decide, what to do with it. In the light of the concept about measuring complexity this way, it will mean, that there might be some files that have reasons to be as long as they are, however, some of them wouldn’t be, and this is precisely the places we can look at to identify for meaningful tech debt t be paid.
3. Measure file complexity by the longest indentation in the file
4. Observe what files are changed together with the most often
The version control system can also provide for us such insights. Similarly to other approaches, any tool will give us only data, but it is our job to interpret data correctly. For example, one of the thing that can be observed, that file and tests for this file are frequently changing together. This information can be seen in two distinct ways. For example, it is expected behaviour as we are currently adding new functionality to this place, so tests to cover that new functionality should be added. However, if, for example, there was none new feature added there it can be a sign, that tests are too specific or testing the wrong thing, so whenever someone is doing refactoring they also need to change tests.
5. When the file was modified last time
One of the other ways how to look at our codebase what percentage of the files was modified recently. If we have even distribution in the dimensions what percentage of the files was modified at which day it is a good sign, that we have modules, that were written and are working predictably as expected so there was no need to modify them in last N days/weeks/months. The call is always ours, as this as well can be a sign that we have dead code in the codebase or some piece that no one know how does it work, so no one is brave enough to modify any of it pieces.
What we are looking in most of the ways on the analysing codebase this way – something that surprises us. For example, two files that are unrelated from a feature perspective, but often change together. Or file where none new features were supposed to be added in last year, but that is modified every week.
I also noticed how unexpectedly rich from the data point perspective information from the version control system can be. Through all my life as a professional software engineer, I always appreciated how much good version control systems (such as git, for example) are bringing to my everyday work experience. The author pointed out that such systems also held a lot of the information about how engineers are behaving, for example, which files are modified the most recently, what files are modified together etc.
What didn’t I like?
To the sceptical eye, it may look like the whole book is some kind of promotion for the software with which the author is exploring codebase in most parts of the book – CodeScene. And I am not saying that it is not valid, as many beautiful illustrations that are in the book are coming from this tool. However, the author includes some free alternatives, and for performing some of the analysis (for example, detecting files that were modified by the vast majority of the contributors), the only command line is needed. Also, the project that prior CodeScene – Code Matt is open-sourced, so can be used free of charges.
My general satisfaction with the book is high, and I would love to read more books like this in the future – one that a providing deep, focused, and reasonably opinionated view to some narrow area of engineering topics.
Have you read this book? Did you try any approaches of the ones that the author described to prioritise tech debt?