The Case for Self-Sovereign Identity

This piece is the second installment of a series on digital identity called “Please Allow Me to Introduce Myself: The Past, Present, and Future of Digital Identity.” This series aims to explore the evolution of digital identity, the state of self-sovereign identity today, and its use cases.

Check out previous installments here: Part 1: The Evolution of Digital Identity

the evolution of digital identity
Photo by Shane Avery on Unsplash

In my last piece, I concluded with a somewhat grandiose claim — that self-sovereign identity (SSI) has the potential to be the future of the identity and access management industry. I went on to identify some broad guiding principles that SSI entails. In this piece, the rubber meets the road. I begin by exploring the mechanics of SSI, focusing on the fundamental components of any SSI solution, and the benefits of such a design.

However, I felt it would’ve been incomplete to conclude the piece there. Why? Theoretical benefits aren’t enough to overcome the tremendous inertia (read: laziness) of customers when it comes to online security products — just ask Pretty Good Privacy (PGP). As such, this paper then goes on to talk about what could enable SSI to go “beyond the whitepapers” into something that could be on your phone in the near future.

SSI: the nuts and bolts

The premise of SSI is that it’s an identity you fully own — there is no central entity that manages that relationship. Whenever you want to establish your identity — be it to order a drink at a bar or to check into a flight — you can present “claims” about your identity that can be subsequently verified with cryptographic certainty by the authority that wants to check it.

There are three component parts to a self-sovereign identity solution, and we’ll be breaking each of these down sequentially.

Decentralized Identifiers (DIDs)

When we think about our identities today, be it the passport or student ID we present at an airport or the Facebook login we use to sign into an online game, they are invariably owned and managed by someone other than us. A decentralized identifier (DID) is an online identity that you can create that is truly owned by you.

The best analogy I’ve heard for DIDs is that they are a URL to your identity. Just as typing “https://r3.com/” into your browser takes you to our homepage, similarly your DID can be used to pull up a DID document (a “DIDDoc”) that contains a series of “claims” about your identity. More on those claims in a minute.

Before we take a closer look at the DIDDoc and the associated claims, it’s important to understand the core structure of the DID. Here’s what a DID on Corda Network could look like.

Sample DID on Corda
A sample DID on Corda. Image made by author.

Breaking it down into its component parts, the first component — the scheme — denotes that this is a digital identity, much like “http” denotes a website URL. The next term, here “corda” is known as the method, essentially referring to which network it is on — say Corda or Sovrin for instance. The next part — “tcn” in this example — refers to a subsystem or subnetwork (say the testnet versus the main network). Lastly, we have a unique identifier, which is unique to you on that given network.

Much like most identity solutions nowadays, DIDs are fundamentally underpinned by public-key cryptography (for a refresher, see part 1). A DID is generated and addressed by a public key, while the corresponding private key is held by the DID holder. This allows the DID owner to have complete control over their identity.

So, all this information about structure is great and all, but how can I use decentralized ID to get myself a beer? Let’s come back to the URL analogy. If the decentralized identifier is like a URL, then the analogy would mean that it takes us to a page that had all of our information on it, right? Well, not quite. And as you’ll see, such an approach would have a range of drawbacks. So what’s actually in the DID document?

The most fascinating part about this is that the DID Document itself doesn’t have any personally-identifiable information (PII) stored on it. If it did, then it would simply be as good as a centralized solution. Ostensibly, it could actually be far worse than a centralized system if hashed PII was stored on a public blockchain. Blockchains love to advertise their immutability, and in this case that entails PII being publicly available in perpetuity. If the hash were to somehow be broken (I know quantum computing is the buzzword du jour), then all of that PII would be laid bare, and blockchain would have been hoist with its own immutability petard.

So, how can a blockchain-based ID solution overcome the challenge presented by immutability? The answer lies in verifiable claims. Instead of containing personal information itself, the DIDDoc simply contains a series of credentials that can be verified. Instead of having text that says, “Arjun studies at Wharton”, it would have an external link to Wharton’s registry that would verify that I’m a student there.

This begs the question of how these credentials work, taking us to our next section.

Verifiable credentials

As we’ve seen, DIDs themselves don’t contain any personally identifiable information; that would run the risk of the ID system becoming obsolete quickly. As such, information such as my age, continuing the bar analogy, is contained in credentials. The fascinating part about the credential system is it’s just an upgrade on precisely the way we identify ourselves today. Today, if one were to go to a bar, they would hand over a driver’s license, passport or equivalent. These are credentials, just in paper form. The challenge with paper credentials, though, is that any form of verification likely means getting on the phone with the DMV. If the license says that someone is 25 when they look no older than 17, the only way for the bar to verify whether the age on the driver’s license was tampered (or if this person was issued a license at all!) is probably to get on the phone with the DMV. And who wants to get on the phone with the DMV in the middle of happy hour service? Jokes aside, though, it can be extremely challenging to verify information presented in credentials. The advantage of a digital credential — over and above never having to worry about spilling coffee on it — is that someone who wants to verify your age (for instance) can verify details through cryptography. Moreover, it’s something that you fully own, as opposed to something being managed by a third party.

So what can be proven, and what does the proof process look like?

Let’s go back to our favorite example: I want to somehow prove my age to the bar to be served a bar with my driver’s license. Let’s introduce some terminology here. The credential I want to use is my driver’s license, which has some claims on my identity, amongst which is my age. I am the holder and subject of this credential, issued by my state government, and the bar wants to verify it.

To do so, instead of handing over my whole driver’s license, which would disclose personal information like my address, I can simply offer a verifiable presentation. A verifiable presentation presents only parts of the claim and allows certain elements to be verified cryptographically by the bar. More concretely, there are four key components that can be verified:

These four elements — especially the first and fourth — would be extremely hard to prove in a paper-based system. Say, for instance, that someone applying for a job claims to have studied at Wharton. However, as it happens, this individual was found guilty of plagiarism and had their degree revoked by the school. If this person just showed the paper diploma found at graduation to the school, it would likely take several phone calls and a lot of hassle (or calling a central database of degrees from the school) to see if this otherwise fully legitimate credential had been revoked by the issuer.

Let’s come back to that bar. One possible verifiable presentation would simply be my date of birth, which is a claim on my driver’s license. However, we can take this one level further when we realize that the bar doesn’t need to know my actual date of birth, rather simply whether I am over 21. Whether I am 22 or 29 is irrelevant to them. As such, we can even make a presentation of a derived claim — the binary question of whether I’m over 21 — and the bar can verify the answer to that question cryptographically.

Supporting actors: agents and decentralized key management

Finally, no ecosystem is complete without supporting actors. The most evident one of these is the actual interface that humans will use to interact with identity, referred to as “agents” in the SSI community. I think a good example to conceptualize these agents are like the mobile wallets used today to hold cryptocurrencies. The premise of agents is that it offers an interface to manage DIDs and verifiable credentials and presentations much like a cryptocurrency wallet can hold and manage tokens. To those less familiar with the crypto world, another possible way to think of these agents are — quite broadly— as the next evolution of password management services like Dashlane or Lastpass. The core distinction is that those services may require you to copy your details from the app to their platform and authenticate accordingly. On the other hand, a verifiable presentation allows the verifier to simply cryptographically check a provided credential.

Another pivotal element is how the public key infrastructure (PKI) is implemented in SSI systems. As we recall from the previous installment, the public-key infrastructure helps map public keys and other virtual identifiers to real-world identities and ensure that there are systems to generate identifiers and recover compromised ones. Conventionally, this has been done by a third party — or a Certificate Authority — verifying the link between the identifier and real ID and creating a digital certificate to certify this. The implication is that for us to truly trust an identity, we must trust the CA that maps these identifiers to real identities. The problem emerges that if CAs are highly centralized, then any benefits of decentralization of trust stemming from the DID/VC model are largely for naught. This means that implementing a decentralized key management system (or DKMS) is essential for SSI.

The fascinating part is that, if implemented correctly, the ledger itself obviates the need for a CA. Because it’s a decentralized solution with no ‘higher authority’ to turn to, the burdens of real ID linkage and key recovery fall to participants on the network. Trusting that the identifiers are tied to real IDs happens through the verifiable credential system. Because credentials are issued to real people (Arjun Govind — not my DID — is issued a driver’s license or a college degree), the ability to show ownership of a credential links my DID to a real identity. Managing key recovery is a somewhat finer point — some methods that have been implemented here include offline recovery, using some physical token as a recovery tool, and a “trustee” model. The trustee model involves giving some encrypted account recovery information to some trusted friends, and in the event that one loses access to their phone (for instance), they can use this information to authenticate back in.

Moving beyond the whitepapers

The theoretical advantages of self-sovereign identity — spanning decentralization to managing revoked credentials — are clear. However, that in and of itself is likely inadequate to drive wide-scale adoption. The quintessential example of theoretical benefits being insufficient for widespread adoption is PGP — the benefits were there, but it just wasn’t usable enough for the average person — or “Johnny”, as this iconic paper names him — to use it.

So what do I think can drive adoption? None other than Big Brother.

sovereign identity
Photo by Thomas Kelley on Unsplash

The main driver I see that will allow SSI to extend “beyond the whitepapers” is regulation. While industry-specific regulations like KYC requirements and PSD2 are topics for subsequent posts, I want to home in now on perhaps the much-discussed policy on personal data management — the EU’s General Data Protection Regulations, or GDPR.

GDPR demands nothing short of a sea change in how organizations store and manage personal data. One of the greatest problems within GDPR compliance is dealing with the “right to erasure”, also referred to as the “right to be forgotten”, allowing users to request that the company “forgets” them, or removes every instance of their data from their system. The challenge this presents is threefold. For starters, how does the company know if the request is valid? To demonstrate this problem, consider the case of a disgruntled employee acting against their employer. To retaliate against the employer, the employee decides to try having their employer’s personal cloud storage service deleted. Of course, it’s not as simple as getting on the phone with Dropbox and asking them to hit delete — there will likely be several layers of authentication required. However, companies need to ensure that such authentication measures are highly reliable — it may be possible, for instance, that this disgruntled employee may be able to deduce answers to some of the security questions through social engineering.

However, even after moving past this problem, companies face the problem of figuring out what information should be removed. For instance, if a social media company gets a deletion request from a user who is tagged in photos posted by their friends, what information if any should be removed? I’m not a lawyer, so I can’t give a definitive answer — what constitutes personal data will likely be guided by precedent that develops over time. However, companies will have to invest a lot of time and money in attorney fees in figuring out that definition.

So say we know that the deletion request is valid and we know what data to delete. We’re all clear right? Just go and hit delete, right? Well, not quite. In fact, this may be the hardest challenge of them all — finding that data. And it’s a little bit more involved than just Ctrl+F. The reality is that information is stored across several databases in a country, perhaps across several divisions, offices and maybe even countries. On a technical level, this could happen due to mirroring, a process you may have encountered while downloading software. If I’m based in Philly, it’ll be far quicker for me to access data held in a server in New York or Virginia as compared to one in Shanghai or Mumbai. On the flip side, if I’m in Bangalore, it makes much more sense for me to just use a server somewhere else in India. Redundant as this may seem, it’s very common in practice, especially in applications where the microsecond differences in accessing data matter.

So there you have it. We first need to figure out if a deletion request is valid. Once we do so, we need to reach out to our lawyers and figure out what data needs to be deleted. And once we do that, we need to trace down every last bit of that data across all of our databases and delete it. And that’s just one fraction of GDPR compliance for you.

Clearly, companies will incur substantial costs in reorganizing data to make it easier to handle these requests, and that’s where I believe an opportunity lies. Instead of risking GDPR’s hefty fines and fees, companies could use this opportunity to invest in an alternative, GDPR-compliant, self-sovereign approach to identity.

In this article, we took a look under the hood at self-sovereign identity, offering a look at the fundamental schema (decentralized identifiers and verifiable claims) that underpins it. After understanding its theoretical benefits, we explored why this technology hopefully won’t be confined to whitepapers — understanding the tailwinds behind it, the most notable of which being regulation. In the coming weeks, I’ll be exploring applications of digital identity solutions in specific industries, with an emphasis on FinTech.

Learn more about what we at R3 are doing in decentralized identity.

__

Sources:

Decentralized Identifiers — W3
Verifiable Credentials Data Model — W3
A Deep Dive: The Right to be Forgotten — Harvard University
Decentralized Key Management Systems — Rebooting Web of Trust 4