Posts (page 18 of 43)

PCI's Lessons for Passwords Mar 30, 2017

A Promethean Struggle

Let’s take a look at how to protect two ancient technologies: credit cards and passwords. PCI teaches us about handling sensitive data, so let’s consider how we might broaden those lessons to authentication. An underlying theme is how design choices influence and are influenced by user behavior and the scale of an app.

Protect, Control, Isolate

In order to reign in fraud, card issuers developed the PCI Data Security Standard. It establishes a baseline of controls and processes for handling card data. Importantly, these requirements aren’t just about a point in time like running a pen test or conducting a vulnerability scan, but also about establishing recurring activities.

PCI’s intent is essentially what you’d expect from a DevOps effort — to design and deploy systems that encrypt and control access to sensitive data. These are hardly controversial ideas. What PCI did was introduce common language and requirements for app to meet and so that adherence to those requirements can be evaluated consistently. To make it more forceful, PCI includes consequences for failing this due diligence.

Of course, it’s possible to meet the baselines of compliance and still be insecure. PCI isn’t a guarantee of security — it’s an exercise in managing risk. In order to fully flesh out a security program, you should turn to something like the NIST Cybersecurity Framework.

At a high level, PCI defines six goals that encompass 12 requirements. The goals may seem overbroad, like “build and maintain a secure network” or “protect cardholder data”. But that’s not much different from the kind of guidance you see in the OWASP Top 10. Over time, the standard has responded to feedback, both in terms of clarifying requirements and keeping pace with new technologies. It’s one of the reasons that in-scope PCI systems tend to have better HTTPS configurations than others.

Passwords, Cards, and Innovation

One area where I believe PCI has been successful is where people have been actively avoiding it. Not only might app owners find dealing with compliance a distraction, but auditing systems requires budget and resources. In practice, apps don’t need the card numbers, they just need an authorization to make charges against them.

Thus, if an app deletes and no longer stores card data, then it significantly reduces its PCI burden. (That doesn’t mean it’s absolved of building a secure app, just that PCI won’t hold it responsible for doing so.)

A payments industry has grown up in response to this. So now instead of handling card data themselves, apps handle tokens that point to third-party systems, and they transparently frame third-party forms that handle the collection card data.

This strategy shifts from handling the identity of the card to handling a temporary authorization to charge it. In the event of compromise, access to the authorization token shouldn’t put the card at risk of arbitrary use elsewhere.

Promote Careful Implementation

Currently, NIST is drafting new Digital Identity Guidelines. The guidelines put design forefront by making three important points: make it easy for users to do the right thing, hard to do the wrong thing, and don’t punish mistakes — make them recoverable. A major motivation for this attention to design is to avoid the kinds of failures in encrypted email documented by the 1999 paper, “Why Johnny Can’t Encrypt.”

Delegating identity with protocols like OAuth 2.0 or using SAML for single-sign-on is similar to delegating card handling to a third party. It’s an effective strategy for transferring risk. And it sets a mentality that credit card numbers and passwords should be treated like liabilities that infect the systems they touch.

Of course, the future isn’t magically improved by delegation. Apps still need to protect tokens. There’s always a risk calculation and it’s always important to go through threat modeling exercises to understand what might be changing.

For example, moving to an identity provider or service-based tokens makes the account recovery and password reset processes more complex. When recovering an account, it would be important to know whether an attacker added a service to it under their control. One solution is to educate users about this, and give them a means to manually revoke tokens. Unfortunately, this returns to the challenge of good interface design.

With luck, we won’t see future research with titles like, “Why Johnny and Janey Can’t Recover Their Digital Identities.”

Instead, we’ll hope for a future where breaches are no longer measured in the millions of records. We’ll hope for a future where identity is resilient to phishing, where exposed passwords don’t lead to immediate account compromise, where strong authentication doesn’t require weakened privacy, and people continue to shop in safety.
Builder, Breaker, Blather, Why Mar 20, 2017
At the beginning of February 2017 I gave a brief talk that noted how Let’s Encrypt and cloud-based architectures encourage positive appsec behaviors. Over a span of barely three weeks, several security events seemed to undercut that thesis – Cloudbleed, SHAttered, S3 outage.

Coincidentally, those events also covered the triad of confidentiality, integrity, and availability.

So, let’s revisit that thesis and how we should view those events through a perspective of risk and engineering.

Eventually Encrypted

For well over a decade, at least two major hurdles blocked pervasive HTTPS. The first was convincing sites to deploy HTTPS in the first place and take on the cost of purchasing certificates. The second was getting HTTPS deployments to use strong configurations that enforced TLS for all connections and only used recommended ciphers.

Setting aside the distinctions between security and compliance, PCI was a crucial driver for adopting strong HTTPS. Having a requirement for transport encryption, backed by financial consequences for failure, has been more successful than asking nicely, raising awareness at security conferences, or shaming. I suspect the rate of HTTPS adoption has been far faster for in-scope PCI sites than others.

The SSL Labs project might also be a factor, but it straddles that line of encouragement through observability and shaming. It distilled a comprehensive analysis of a site’s TLS configuration into a simple letter score. The publically-visible results could be used as a shaming tactic, but that’s a weaker strategy for motivating positive change. Plus, doing so doesn’t address any of the HTTPS hurdles, whether convincing sites to shoulder the cost of obtaining certs or dealing with the overhead of managing them.

Still, SSL Labs provides an easy way for organizations to consistently monitor and evaluate their sites. This is a step towards providing help for migration to HTTPS-only sites. App owners still bear the burden of fixing errors and misconfigurations, but this tool made it easier to measure and track their progress towards strong TLS.

Effectively Encrypted

Where SSL Labs inspires behavioral change via metrics, the Let’s Encrypt project empowers behavioral change by addressing fundamental challenges faced by app owners.

Let’s Encrypt eases the resource burden of managing HTTPS endpoints. It removes the initial cost of certs (they’re free!) and reduces the ongoing maintenance cost of deploying, rotating, and handling certs by supporting automation with the ACME protocol. Even so, solving the TLS cert problem is orthogonal to solving the TLS configuration problem. A valid Let’s Encrypt cert might still be deployed to an HTTPS service that gets a bad grade from SSL Labs.

A cert signed with SHA-1, for example, will lower its SSL Labs grade. SHA-1 has been known weak for years and discouraged from use, specifically for digital signatures. Having certs that are both free and easy to rotate (i.e. easy to obtain and deploy new ones) makes it easier for sites to migrate off deprecated versions. The ability to react quickly to change, whether security-related or not, is a sign of a mature organization. Automation as made possible via Let’s Encrypt is a great way to improve that ability.

Facebook explained their trade-offs along the way to hardening their TLS configuration and deprecating SHA-1. It was an engineering-driven security decision that evaluated solutions and chose among conflicting optimizations – all informed by measures of risk. Engineering is the key word in this paragraph; it’s how systems get built.

Writing down a simple requirement and prototyping something on a single system with a few dozen users is far removed from delivering a service to hundreds of millions of people. WhatsApp’s crypto design fell into a similar discussion of risk-based engineering¹. This excellent article on messaging app security and privacy is another example of evaluating risk through threat models.

Exceptional Events

Companies like Cloudflare take a step beyond SSL Labs and Let’s Encrypt by offering a service to handle both certs and configuration for sites. They pioneered techniques like Keyless SSL in response to their distinctive threat model of handling private keys for multiple entities.

If you look at the Cloudbleed report and immediately think a service like that should be ditched, it’s important to question the reasoning behind such a risk assessment. Rather than make organizations suffer through the burden of building and maintaining HTTPS, they can have a service the establishes a strong default. Adoption of HTTPS is slow enough, and fraught with error, that services like this make sense for many site owners.

Compare this with Heartbleed. It also affected TLS sites, could be more targeted, and exposed private keys (among other sensitive data). The cleanup was long, laborious, and haphazard. Cloudbleed had significant potential exposure, although its discovery and remediation likely led to a lower realized risk than Heartbleed.

If you’re saying move away from services like that, what in practice are you saying to move towards? Self-hosted systems in a rack in an electrical closet? Systems that will degrade over time and, most likely, never be upgraded to TLS 1.3? That seems ill-advised.

Does that S3 outage raise concern for cloud-based systems? Not to a great degree. Or, at least, not in a new way. If your site was negatively impacted by the downtime, a good use of that time might have been exploring ways to architect fail-over systems or revisit failure modes and service degradation decisions. Sometimes it’s fine to explicitly accept certain failure modes. That’s what engineering and business do against constraints of resource and budget.

Coherently Considered

So, let’s leave a few exercises for the reader, a few open-ended questions on threat modeling and engineering.

Flash has been long rebuked for its security weaknesses. As with SHA-1, the infosec community voiced this warning for years. There have even been one or two (ok, lots more than two) demonstrated exploits against it. It persists. It’s embedded in Chrome², which you can interpret as a paternalistic effort to sandbox it or, more cynically, an effort to ensure YouTube videos and ad revenue aren’t impacted by an exodus from the plugin.

Browsers have had impactful vulns, many of which carry significant risk and impact as measured by the annual $50K+ rewards from Pwn2Own competitions. The minuscule number of browser vendors carries risk beyond just vulns, affecting influence on standards and protections for privacy. Yet more browsers doesn’t necessarily equate to better security models within browsers.

Approaching these kinds of flaws with ideas around resilience, isolation, authn/authz models, or feedback loops are just a few traits of a builder. They can be traits for a breaker as well, in creating attacks against those designs.

Approaching these by explaining design flaws and identifying implementation errors are just a few traits of a breaker. They can be traits for a builder as well, in designing controls and barriers to disrupt attacks.

Approaching these by dismissing complexity, designing systems no one would (or could) use, or highlighting irrelevant flaws is often just blather. Infosec has its share of vacuous and overly-ambiguous phrases like military-grade encryption, perfectly secure, artificial intelligence (yeah, I know, blah blah blah Skynet), use-Tor-use-Signal, and more.

There’s a place for mockery and snark. This isn’t concern trolling, which is preoccupied with how things are said. This is about understanding the underlying foundation of what is being said about designs – the risk calculations, the threat models, the constraints.

Constructive Closing

I believe in supporting people to self-identity along the spectrum of builder and breaker rather than pin them to narrow roles – a principle applicable to many more important subjects as well. This is about the intellectual reward of tackling challenges faced by builders and breakers alike, and discarding the blather of uninformed opinions and empty solutions.

I’ll close with this observation from Carl Sagan in The Demon-Haunted World:

It is far better to grasp the universe as it really is than to persist in delusion, however satisfying and reassuring.

Our appsec universe consists of systems and data and users, each in different orbits.

Security should contribute to the gravity that binds them together, not the black hole that tears them apart. Engineering works within the universe as it really is. Shed the delusion that one appsec solution in a vacuum is always universal.
1. Whatsapp provides great documentation on their designs for end-to-end encryption. ↩
2. In 2017 Chrome announced they’d remove Flash by the end of 2020. ↩
Out of the AppSec Abyss Mar 7, 2017

The AppSec Reanimated series has begun! My goal for this series is to explore positive ways to make security a natural part of the SDLC. We won’t inspire behavioral change by jolting developers with electricity or injecting them with creepy green goo. But we might succeed by highlighting technologies and processes that help security become less of a supernatural event.

This first webinar starts a journey Out of the AppSec Abyss towards the cloudy realms of modern dev concepts. It introduces an underlying theme for this series: How the dimensions of time and space influence appsec.

Time is a key element of modern app development. It also serves as a great metric to track. How long does it take to perform a task well? How might you do the task faster? A task could be updating a system, releasing a patch, analyzing a log for security events, or collecting forensic data. Any of these could fall under the umbrella of continuous integration and continuous deployment (CI/CD).

Today’s development methodologies embrace continuous efforts. CI/CD processes have motivated fundamental changes to building and maintaining apps. They blur the distinction between writing code and managing systems, thus creating the role of DevOps. They emphasize ever-shrinking release cycles, thus necessitating robust testing to maintain confidence in (and stability with) such rapid changes.

DevOps has also been empowered by the way cloud environments have abstracted system management into APIs and scriptable interfaces. Being able to treat systems more like code components increases their predictability and consistency (well, at least it should…).

Treating systems like app components that spin up and spin down on demand also sets different expectations for their management. No longer should system uptime be treated as a badge of honor, but more as a measure of suspicion that the system has not received patches or updates. The uptime of the service is paramount, not the individual systems beneath it.

A benefit of running ephemeral systems like this is that it should be easier to patch and test a single reference image for new instances to clone rather than haphazardly patch every current instance (alas, reality does not always follow theory). Think of this as retire and replace as opposed to patch and preserve. The expected security of this approach is that the window during which systems are not patched should shrink. Or from a risk perspective, the window of attack against known vulnerabilities on these systems has shrunk. In other words, we’re trying to reduce risk rather than chase a more abstract “perfect security.”

There’s a catch. There’s always a catch. This retire and replace premise works well for stateless systems that underpin a service. Stateful systems (think data stores and key/value stores) have different uptime profiles and are more sensitive to nodes constantly going up and down if that impacts availability of data. This catch is a nod to the nuances of app ecosystems, not an argument against the retire and replace model.

CI/CD, DevOps, and cloud can improve security, but they aren’t magic security incantations. They’re more about shifting attack surfaces. Where DevOps removes a lot of separation of duties, it doesn’t remove the burden of separation of data — protecting production data from unauthorized access. The app ecosystem must still ensure attackers can’t trivially bypass hardened cloud networks by merely committing a few lines of code from a compromised dev system.

Think of this shifting of attack surface as the appsec dimension of space.

Even though CI/CD and cloud models encourage behaviors that benefit security, it’s important to see where they introduce new risk or merely rearrange the old.

The ease of spinning up systems may make maintaining patch levels easier, but it may also make scanning and monitoring those systems harder. You’ll always need a strategy for asset management and log collection (this is hardly an earth-shattering claim); it’s just going to have to adapt to the nature (and perhaps scale) of the deployments.

In many cases identity management for services shifts from the systems they run on (like their IP address) to the services themselves. Rather than just tightening IP/port combinations in a firewall, you also need to start thinking in terms of hardening syscalls so that compromised containers don’t infect their peers. The future is compartmented, service-level authorization, not the broad system-level authorization of the past.

Those who have been paying close attention will have noticed that I’ve been focusing mostly on systems, with coding having received the briefest mention. One premise of CI/CD is that apps should be faster to fix when vulns are discovered. But that assumes the vuln is due to an implementation flaw and not a design flaw. Subtle architecture changes or design modifications have far-reaching implications for a code base that can’t be addressed in a two-week sprint. Also, many modern app frameworks have horrendous dependency graphs. So, while your code may be secure and well-reviewed, the quality of the open source libraries and modules it depends on may be weak.

Couch this dimension of space in terms of attack surface management, where changes are potentially reducing network surface while expanding the application. Tackle some easy wins, measure along the way, reduce the time it takes to complete tasks.

It’s a continuous journey to emerge from the appsec abyss, but with planning, metrics, and perseverance you’ll find many discrete benefits along the way.