Measure (At Least) Twice

20180218_143538 - SmallBear with me here. I promise this is about software – but first I need to talk about building.

I have several uncles and uncles-in-law who are builders, and when I was younger I ended up helping out with minor jobs here and there. One of the most recurrent mantras was this one:

“Measure Twice, Cut Once.”

The idea, if it’s not immediately obvious, is that you double-check your measurements before committing to an action which is costly or hard to undo. Once you’ve cut that beam or board, you can’t un-cut it – the board will stay cut despite your best efforts to turn back time. If you cut it too small, you can no longer use it for what you planned to. If you cut it too big, you have to try again. Measuring is cheap – cutting is expensive. Measure twice, cut once.

I consistently see software engineers fail to even measure once.

I once reviewed a pull request from a developer who had decided to optimise part of our code base which performed intensive weather modelling. He had spent a week on it – and he had spent that time re-writing a large number of string plus operations to use StringBuilders. In some cases this can provide some performance benefit – but the re-written code was in the setup and teardown phases of the modelling. Setup and teardown took only seconds, while the modelling itself could take minutes or hours. Worse, this developer had introduced several bugs into previously-working code; but that would make this an anecdote for a blog post about the importance of testing and safe refactoring, and this is about measuring, so let’s focus on a different “worse”. Not only had he optimised the wrong part of the code base, but because he hadn’t measured performance, he couldn’t demonstrate that he’d actually made anything faster (aside from some vague assertions that StringBuilder is faster than string-plus, which isn’t always true).

I didn’t even need to measure to know that his change wasn’t going to make things significantly faster – the code wasn’t in a loop, and I knew that the overall modelling time was long. Shaving off microseconds at a time in a scenario like that is a losing game. The other reason I knew for certain that his change wasn’t going to pay dividends was that I had already spent quite some time improving the efficiency of that setup process. We had benchmark tests around parts of that code, and we had made significant wins. Most of them were had by fixing the .GetHashCode and .Equals methods of a custom coordinate class we were using. We were wrangling a fairly large amount of data into data structures needed by the modelling code, and set operations like .Contains were taking far too long. It turns out both .GetHashCode and .Equals were creating additional instances of the class to perform certain comparisons, and so we were churning the garbage collector as well as performing far too many operations. Some careful tweaks to those methods – operations performed many times within nested loops – took our setup time down from minutes to seconds. We knew that we’d made things better because we were measuring – but the benefits of measurement go much deeper than just knowing how you did.

All developers fall somewhere on the spectrum between, roughly, terrible and genius. I’m not sure where on that spectrum I fall (I suspect it varies depending on what sort of programming I’m doing), but I am pretty sure that I wasn’t good enough to read through thousands of lines of C# across many different files and realise, through pure mental analysis, that we had a performance problem in our coordinate comparison code. Even if I had noticed that it wasn’t efficient, I wouldn’t have been sure how much the problem was contributing to our run time (as compared to, say, our inefficient string concatenation). Instead, I found out by measuring.

In this case, I used one of the simplest of all code profiling techniques: the pause button in my debugger. Simply run your code, and sometime during the long-running process, hit pause. The line you’ve broken on is extremely likely to be part of the problem. Typically, it will fall into one of two categories:
1. A line which is executing many times. In this situation, something further up the stack trace is calling the code you’re looking at inside a large (or nested) loop.
2. A line which is taking a long time to execute. Perhaps it’s hitting disk or network and waiting a long time before continuing.
There is a third (statisically less-likely) option: you’ve just happened to break execution somewhere that isn’t part of the performance problem. That’s why you measure (at least) twice. Break your long-running operation a few times, and compare the stack traces. If your stack dump consistently looks the same, you’ve found your performance hot-spot.

That’s one of the simplest ways of measuring your code, and it may suffice for tracking down hot-spots, but there is a lot more to measurement than just hitting pause in your debugger. Do you suspect you have a database problem? Don’t go hunting through code looking for your favourite anti-patterns. Instead, measure the number of database round-trips your operation makes. Use your database tools to identify expensive queries – and even once you’ve found them, don’t necessarily just start staring at code. Are you missing an index? There are tools which can tell you. Even if the query is bad, looking at the query plan will tell you exactly which parts of the query are causing the problem, so you don’t need to guess.

Maybe you are hitting multiple services and combining the results? If you’ve just learned about the async keyword, you might be tempted to dive in and start making everything async – but if almost all of the delay is coming from a single call, it might not help much (you’d be better trying to find out why that service is so slow). If you really are dealing with multiple similar-cost calls which you combine after they’re all complete, async may be a good investment of your time – but you won’t know until you’ve measured.

For one weather-forecasting algorithm I measured, I didn’t turn up anything obvious. I measured and measured some more, and eventually I discovered we were utilising the CPU cache poorly – the bulk of our algorithm time was spent fetching data from main memory after cache misses. Updating the algorithm was a very costly measure, but investing in hardware with better memory bandwidth gave us some short-term wins. There are very few developers who can look at an algorithm and spot a cache utilisation problem, but measurement can find it for you.

I think I understand why some developers prefer to dive right into code, instead of measuring the problem. We like to think we’re smart. We know stuff about coding, and we’re good at it, and so we can jump in and spot what the original developer didn’t. We can use the sheer power of our minds to solve the problem. It’s a little like the builder who drives in nails with his forehead. It might be impressive, but it’s not very effective, and you’ll end up with a headache.

The opposite problem happens too: we make the mistake of not valuing our time. “What can it hurt to spend a week re-writing those string-plus operations as StringBuilder operations,” we ask ourselves. It makes our code better (by one measure, I suppose), and it won’t hurt performance. But if we spend a week of our time, was it really worth the investment? Measure up front, so you can make that decision based on facts instead of guesses.

Perhaps some developers just feel like they have great intuition. One of my uncles was actually pretty good at estimation: he’d cut timber a shade too big, and trim it down as he was placing it. He only did that occasionally, though. A lifetime of building told him when it made sense to trust his estimation skills, and when to measure twice, cut once.

So we’re back to builders, and here’s the deep lesson. If a builder will measure twice to avoid wasting a bit of cheap wood, how much more should we measure to avoid wasting precious time? I’ve seen weeks wasted on the wrong problem, simply because someone didn’t bother to measure.

Please, my fellow developers: measure (at least) twice.

Leave a Reply

Your email address will not be published. Required fields are marked *