Mean Time to Remediate

One of my coworkers recently found this article that states mean time to remediate (MTTR) is worthless, which certainly goes against our four golden metrics of VM. Spoiler alert: MTTR is one of them. So what’s the difference? Vulnerability management (VM) is not the same as a Security Operations Center (SOC). So let’s talk about those differences and why I measure MTTR in VM, while agreeing it’s not a very good statistic for SOC.

VM vs SOC

There are certainly some similarities in the skill sets required for both VM and SOC. I’ve even worked places where a position split halftime between the two. I was the prototype for that role, and I filled that role capably for about a year before shifting full time to VM. Both job duties relied heavily on my previous experience as a System Administrator. To me, the big difference was whether you want to play a short game or a long game. SOC Is usually a short game. VM is a long game.

SOC is a sprint. And sometimes it’s the one where you’re on third base, it’s game seven of the World Series, you’re down by a run in the ninth inning with one out, and the batter just hit a fly ball… and you’re gonna either tie that game or hurt yourself trying.

SOC is a Sprint

VM is a marathon, or, perhaps more accurately, a series of marathons on an artificial leg much better suited to helping you limp through a trip to the corner market than running marathons. And it’s not at all uncommon for one person or team to be asked to do both, because the two jobs at least have some similarity and overlapping skill sets. It’s not the perfect fit, but it’s a better fit than having someone from pretty much any other specialty in security try to handle VM.

Ideally, you want to let people specialize. I could have gone either direction. I have a very good friend in this industry who has done both, and just switched from one to the other. We both found to do either role well, you need to specialize in it. You’ll do much better doing one or the other for a year then doing both every day for a year.

The differences are poorly understood, and the results can be detrimental. What’s the difference between a maxillofacial surgeon and a facial plastic surgeon? They sound awfully similar, but a maxillofacial surgeon is a dentist. Both of them did way better than I did in high school biology class, and there’s some overlap in their skills, but they aren’t interchangeable. And I’m going to take their word for it that they aren’t, because they both know lots of things I don’t.

We have a tendency to treat VM like SOC because SOC is well understood. There are enough certifications and classes related to SOC that you can take one a year for 20 years or more before you run out. When it comes to VM, there are a handful of vendor specific certifications and one SANS class, and not much else.

VM needs to be treated like its own specialty, and not just a part-time gig for incident responders or the place you park people who want to be pen testers but aren’t ready for that yet.

Why I Measure MTTR

I don’t disagree that MTTR is a less than ideal statistic for SOC work. There are too many outliers. One time a SOC analyst called me and told me my servers were communicating on P2P filesharing networks. “Lightwave Networks? No, that’s not P2P, you’re thinking of Limewire. That’s different. Lightwave is a datacenter provider. That’s the server looking for updates.”

Granted, the servers weren’t supposed to be trying to auto update. But that’s a much less severe problem than trading illicit MP3s on Limewire. Resolution took minutes. As in two minutes to figure out what was happening, and two minutes to disable the automatic update service.

On the other outlier extreme, you’ve got any incident that involves legal. I once got roped into an incident for lack of anyone else capable of even knowing where to start. It involved a former employee losing a few million dollars and deleting files and messages on his way out to try to cover his tracks. Being the only person in that particular organization who knew it was possible to undelete a file, let alone how to use a sector editor, I ended up being the person talking to legal counsel and then digging around on that hard drive to see what I could find. My methods would make any expert cringe today, but it was 1999. I got the attorney what she needed to get my then-employer a nice settlement, but if you measure my success by MTTR, I did a lousy job because it took years. No one cared about the MTTR though, they cared about the money, which was more than I’ll make in my career.

In incident response, the right answer for MTTR is like the answer my English teacher used to give when we asked how long a paper needed to be. “The minimum it takes to do well.”

Why MTTR is Important in Vulnerability Management

VM is different in two ways. First of all, as a VM practitioner, I’m not measuring my MTTR. I’m measuring another teams’ MTTR. MTTR is worthless for me too. Where are you going to measure, how long it takes for me to scan your network? That depends on the tool and the type of scan I use. But we have a length of time we are willing to tolerate our systems being vulnerable. That length of time depends on our risk appetite, but every company has some idea. And most of them set a policy. Whether remediation teams are following that policy is something you need to know. Just like when I sold computers at retail when I was 19, my manager tracked how many extended warranties I sold, because they had a policy for that.

Second, when it comes to applying a patch, there are fewer factors beyond the company’s control. Sure, sometimes you miss MTTR because there was a maintenance window that month. But the lack of a maintenance window was a decision the company made. It’s not like there’s any dependency on the legal system. If the legal system is involved in an incident, it can drag out for years.

The right answer for MTTR in VM isn’t the minimum it takes to do well. I can give you a number. It can vary, so we may need to have a conversation, but at the end of that conversation, we should be able to agree on a number.

And when I talk to someone who isn’t happy with how their VM program is going, they almost always say the teams aren’t resolving vulnerabilities fast enough. It may or may not be their first complaint, but it’s almost always on the list. And the way that you address that situation is by measuring MTTR, and then looking for ways to reduce it.

That’s why I measure MTTR, and that’s why MTTR is on our list of golden metrics for VM.