There are some things in life you can’t escape, like death and taxes. And also for us techies, there’s programmers creating bugs.
It’s inevitable that software developers end up devoting a great deal of their time to debugging code. Debugging is therefore an important skill to master for maximising efficiency. Regrettably, even the most proficient of developers have the tendency to be, well… clumsy.
There are an abundance of developers who can flick through new features, write code and implement the business requirements like the great Dennis Ritchie himself was tutoring them, but have you ever wondered who cleans up the colossal number of bugs they leave behind?
It’s one thing to know how to write beautiful elegant code. It’s another thing to know how to debug the most disagreeable code you’ve ever seen in your lifetime, which was written by that previously mentioned mythical person who single-handily managed to put together the entire application in 48 hours.
As luck would have it, debugging – like any other skill – is something that can be learned. If you apply the right techniques and practices, you can become great at it. Who knows? You might even enjoy it.
The real secret to debugging
The secret ingredient to debugging is this: understand that it’s all about the mind-set. It’s about taking a logical line of attack to confront the problem — not rushing, not fathoming the resolution and most certainly not assuming you can simply determine the problem.
Most developers have adopted the approach of shooting first, shooting some more, and then when everybody’s dead, manage to ask a question or two. Debugging should be about staying calm and collected, and attacking the problem from a rational, analytical perspective, instead of a dramatic one.
Before we take a deep dive, let us try to understand “what exactly is debugging?” It sounds pretty obvious, right? You open up the debugger perspective and you fix the glitches you are finding in your code. That is where you would be unequivocally mistaken. Debugging has got absolutely nothing to do with the debugger.
Debugging has everything to do with:
- Finding the cause of a problem in the code base
- Recognising the possible reasons for it
- Trying out hypotheses until the eventual root cause is discovered
- Then, in due course, removing that cause and guaranteeing that it will never happen again.
My argument is: debugging is more than fiddling all over the place in a debugger and mutating code while waiting for the business logic to magically kick in.
Now, let’s walk through the typical programmer’s debugging process and you will see exactly what I mean.
The one thing you should NOT do while debugging
So you come into the office and your project manager says he’s got some bugs for you to fix. You, the programmer, decide to sit down at your cubicle with the thought “I will let loose the full power of my intellectual prowess on this blasphemous terror of bugs”.
You start up the debugger. Cautiously you step through the code. Time seems to blur, minutes change into hours, hours into weeks. Slowly, you become the old bloke sitting at the keyboard, motionless in the unchanged debugging session, but in some way you are “closer.” Your children have all grown. Your wife may have left you. The only thing that remains is… the bug.
What an absurd number of programmers do when they want to debug an issue in the code is to fire up the debugger and start looking around from one place to another. Never do this. The debugger ought to be your ultimate alternative. When you start up the debugger straightaway, you are saying, “I do not have any knowledge on the root of the problem is, I will just skim through everything and try to comprehend what is going on.”
It’s like those chaps we see on the road every once in a while: when their car breaks down, even though they do not know anything about cars, they still pop open the bonnet and anticipate finding something inside, thereby giving a false impression to the outside world. A real mechanic might not even stop to help, thinking this fellow has got everything under control.
The exact same principle applies when debugging. You need to know what you are looking for. Do not misunderstand me – the debugger is a wonderful and powerful tool. When used properly, the debugger can help you decipher all categories of glitches and see what materialises when your code is running. Nevertheless, the debugger is not place to start. Numerous bugs can be resolved without ever touching the debugger.
The one thing you absolutely MUST do while debugging
So, what do you do if you are not supposed to fire up the debugger when you begin to debug an issue? The first thing any well-balanced, rational software engineer ought to do is reproduce the bug. Why? To be certain that the issue is truly a bug and that you will be able to debug it.
One hundred percent of glitches that cannot be replicated cannot be debugged. So if you cannot precisely replicate the problem, there is no point in even debugging it. You are exceedingly unlikely to fix a problem that cannot be properly replicated. Nevertheless, even if you did fix it, how on earth would you verify that it was fixed in the approved manner?
Four steps to debugging effectively
1. Replicate the bug
So, when you are trying to fix a bug, your initial objective should be proving that you can replicate the bug yourself. If you cannot, you ought to get the tester who raised the issue in the first place to precisely replicate it for you. If the bug is intermittent and cannot be precisely replicated, this effectively means one of two things: either you do not know the code base properly, or you need to add more variables into your environment that are mandatory to reproduce the problem.
In situations like these, try to gather more evidence using some of the following practices: inserting more logging comments in the code, asking the developer who originally wrote the code, or asking people who have been working on the same module.
Always understand there is no such thing as an intermittent problem. If you do not understand the bug enough to replicate it, you have a very slim probability of accidentally fixing it (even by a guess), and you will have an extremely difficult time knowing if your fix even worked. Always find a way to replicate the glitch, even if it is only replicable in the production environment.
2. Sit and think
Right after you reproduce your glitch, the stage that follows is the most crucial one that software developers avoid for the same reason that they are so hasty to decipher the glitch. Your next step is to sit and think. Yes, that’s correct. Ponder the glitch and what its conceivable roots could be. Consider how the system works and possible explanations for the odd behaviour you are seeing.
You are going to be in a rush to jump into the code and into the debugger and start “looking at things”, but before you start doing that, it is vital to understand what you are looking for and what exactly to look at. You will likely generate a few concepts or hypotheses about what might perhaps be causing the glitch. You ought to have a minimum of two or three that you can experiment with before you move on.
3. Test your concepts
Most of the time, your hypotheses are not going to work out. That is just life. If that is the case, the next best thing you can do is to check your assumptions about how things are working. We naturally assume that code is working a certain way, or that some input or output must be some value. Time and again we think, “this cannot possibly be happening!” and often we are proven wrong. It happens to the best of us. The best thing you can do with these assumptions is to validate them.
You do this by writing Unit Test Cases. Write a few unit tests that check the apparent things that have to be working along the workflow of the problem you are trying to debug. Most of these tests should easily pass, but every once in a while, you’ll write a unit test to test some evident assumption and the results would be nothing short of shocking. Always remember that if the answer to your glitch were obvious, it wouldn’t be a glitch at all.
4. Understand how you fixed it
If you fix a problem, understand what you did to fix it. If you do not recognise whether what you did fixed the problem, you are not done yet. For all you know, you may have inadvertently caused a different problem, or most likely, you haven’t fixed your original problem. Problems never go away on their own.
When you fix the glitch, don’t stop there. Explore a little further and make sure you understand exactly what was going on that caused the problem in the first place, and how your solution fixed it. When software developers debug a glitch by fiddling around with the code and it seemingly starts working, they assume it is fixed without even knowing why.
This is a dangerous habit for many reasons. As stated above, when you unsystematically tweak gears in the system and alter bits of code here and there, you could be triggering all kinds of other glitches without realising it. More importantly, you are training yourself to be a terrible debugger! You might get lucky from time to time, but you won’t have a repeatable procedure or a dependable skillset for debugging.
Effective debugging in practice
A while ago I resolved a nasty bug for a client. It had to do with the Secure Socket Layer (SSL) connections to a third-party client’s system being released randomly. The third-party client was using ancient software, and my client had a custom code that connected with the third-party client’s system over SSL.
There had been numerous efforts to resolve this glitch, which were grounded on not much more than blind deductions. One of these deductions (which seemed to work for some time) was to put all the information into a buffer, and at that time, issuing a single write() function call on the OutputStream function of the SSLSocket, with the expectation that it would all be propelled as a single SSL packet. For a while this gave the impression that it was working, but the bug would occasionally reappear after this so-called “fix.”
Finally the third-party client called in the vendor of their Jurassic-era software, who in due course discovered precisely why their software was intermittently releasing the connection. The messages that we were sending began with a 4-byte length field. They observed that our data was fragmented into two SSL records, the first record comprised of merely the very first byte of the message, and the second comprised of the rest of the message. Their software was not equipped to handle that – it anticipated that the first SSL record would contain at least the 4-byte length field.
The mystery now was why our code was transferring the first byte in a separate SSL record. To find out why this occurred, I ran the code in a debug mode and stepped into the JDK source code. After some time I came across a class sun.security.ssl.SSLSocketImpl. The getOutputStream() function of this class returns a sun.security.ssl.AppOutputStream. The class AppOutputStream implements the write() function. After prudently observing this, I saw something that looked a bit suspicious. There were some lines to decide how many bytes of data to put in an SSL record:
Under certain circumstances, if isFirstRecordOfThePayload and c.needToSplitPayload() is true, it chooses to put at most 1 byte into the SSL record. This is precisely what the client was seeing and what was causing the problem. But it appeared that this was done intentionally, and was not a bug in the JDK. But why? And what is in the method needToSplitPayload()? This is a method in class SSLSocketImpl. After reading about it in detail, I understood that it turned out to be a workaround for a security problem with TLSv1.0 and older. Regrettably, the third-party client’s source code couldn’t deal with this workaround.
Luckily, the people at Sun (now called Oracle) anticipated that the workaround might cause compatibility snags, so they provided a way to deactivate it by setting a system variable jsse.enableCBCProtection to false. We had a choice between disabling the workaround by setting the system property jsse.enableCBCProtection to false, which would make the software vulnerable to the security flaw, or ensuring that we used TLSv1.1 or newer, which is what the management finally decided to do.
Debugging – like software development – has both a science and an art to it. You can only excel at debugging through constant practice, which you will most certainly get throughout your career. After all, life’s inevitabilities are death, taxes and programmers creating bugs.