Avoid BugOps, Do DevOps

DevOps aims to release code quickly with confidence. Frequent, fast releases aren’t the hard part. The challenge is achieving justifiable confidence that changes won’t break the production environment and, when that inevitably happens, that teams are able to quickly analyze and resolve problems.

Etching of villagers building a fence — *Not chasing vulns, but defeating vuln classes*

This level of maturity requires smart investment in automation, testing, and monitoring. (And people! We’ll dive into that angle in a different post.) Automation increases the pace at which code goes through review, testing, and deployment. Testing (hopefully) detects errors and halts the deployment to prevent bugs from reaching production. Monitoring helps ensure the app remains stable under constant change by providing feedback about its health and activity.

Code isn’t perfect. Perhaps an automated step mishandled an error condition, or testing had a gap in coverage that didn’t exercise functionality affected by a code change. Or maybe the app’s monitoring missed a type of event or omitted useful info. Bugs happen.

Vulnerabilities are bugs that impact the security of the app, its data, or its users. Vuln discovery is important. It’s one of the reasons bug bounties have become so pervasive. We want to know what bugs our apps have, especially if they’ve reached production. And we also have to fix them.

Being able to fix vulns fast is commendable. But a too-narrow focus on speed can turn DevOps into BugOps. BugOps is releasing code quickly to fix vulns without considering their underlying cause. It leads to an endless loop of find-fix-repeat. While it’s important to fix vulns promptly, just adding quick patch after quick patch only makes an app more brittle.

Metrics are a critical tool for decision making. But a shortsighted devotion to metrics aggravates the BugOps mentality. Adhering to SLAs while ignoring root causes creates the illusion of secure code. A quickly-patched app may still rest on a weak architecture.

Another tenet of DevOps is building feedback loops — collecting and responding to actions and events throughout the development pipeline. This should apply equally to vuln discovery.

When vulns appear in production, it’s especially important to analyze how they arrived there and what quality controls they bypassed. They might be due to a mistake, where a coding guideline or established process wasn’t followed. Or they might be due to a misunderstanding, where some flaw in the app’s architecture was exposed or some process didn’t exist.

This analysis can inspire fundamental changes to an app’s design that sweep away whole classes of vulns. Or it may introduce controls that make the exploitation of vulns less impactful and more evident.

Good analysis provides insight into gaps in tools, knowledge, or process. For example, if your testing framework can’t model the types of vulns that are being reported, then you have two problems. One, you won’t be able to create effective regression tests. Two, you’re being underserved by automation.

Good metrics provide insight into how well a DevOps team handles security. Collecting metrics emphasizes what topics are important (hence, worthy of measure). Metrics over time produce trends. Trends provide feedback about the effectiveness of security tactics such as introducing a new tool, adjusting a process, or adopting new programming patterns. Some useful metrics related to vuln discovery are

Type of discovered vulns. Do certain categories stand out? Do they share similar causes?
Risk of discovered vulns. Ideally, this would be a common rating based on severity indicated by a CVSS score. No rating is perfect, but CVSS provides a common frame of reference for severity that informs risk.
Speed of fix. What was the time between discovery of the vuln and the code commit that fixed it? How does this measure against expectations or explicit SLAs?
Speed of deployment. How long does it take for a commit to reach production? Is there a fast-path for code to address critical issues? Does the app have feature flags that can trivially enable/disable problem areas until a fix is ready?
Location (e.g. files, objects, functions) of vulns within source code. Review the commit history associated with vulns. Do developers repeatedly address vulns in a particular code path? Is a vulnerable pattern repeated elsewhere, waiting to be reported as vulnerable? Are particular developers responsible for weaker code? Is any automation or tool capable of identifying the vuln?
Staleness of the location of vulns within source code. In addition to space (i.e. where), capture the time associated with the vuln’s fix. When was the last time the affected code was touched? Is it related to older, legacy code? Is it in newer code? This can help highlight whether the app is on a path of cleanup to improve its overall quality or remains stuck with the same eternal programming mistakes.
Effort to fix. Related to speed, this is more about the cost associated with fixing vulns. It may be a measure of hours required to analyze the vuln and commit a fix. It could also be the number of people involved in the process. For example, a vuln might require a complex fix or many engineering discussions to weigh trade-offs.

Avoid letting a swarm of vulns chase your team down the BugOps path. As you fix vulns, take the time to figure out how they might have crept into production, adjust tools and processes to catch similar errors that might occur in the future, and track metrics that help show what kind of progress your DevOps team is making to reduce risk.

Keep an eye out for vulns. Keep your vision on the processes that make DevOps successful.