One of the most difficult problems in cryptographic key management is

keeping a secret key safe from both compromise and loss. If you don’t make

enough backups, the key might be destroyed in a hardware failure or

natural disaster. But if any backup is compromised, the key is

compromised.

Rather than invent new tools, one might try to solve the problem

with encryption. Just encrypting the key normally is not helpful, since

then you have the

same problem with the key needed to decrypt it. Instead, if you encrypt

your key using multiple keys, then

store these keys in different locations, you need all the keys to

decrypt it. This protects against compromise, but if any key is lost,

so is the original key. To fix this,

you’d have to encrypt the original key many times, each time using just

some of

the keys. For example, suppose you had 5 keys and you wanted to be able

to decrypt using any 3 of them. Then you’d need to perform all the

following encryptions:

- E
_{1}(E_{2}(E_{3}(K))) - E
_{1}(E_{2}(E_{4}(K))) - E
_{1}(E_{2}(E_{5}(K))) - E
_{1}(E_{3}(E_{4}(K))) - E
_{1}(E_{3}(E_{5}(K))) - E
_{1}(E_{4}(E_{5}(K))) - E
_{2}(E_{3}(E_{4}(K))) - E
_{2}(E_{3}(E_{5}(K))) - E
_{2}(E_{4}(E_{5}(K))) - E
_{3}(E_{4}(E_{5}(K)))

The number of encryptions needed grows very quickly. It also takes

quite a bit of time and space to store all these multiply-encrypted

keys. Let’s look at some other solutions.

One solution is to simply break the key file into two parts,

make a copy of each, and store the four parts in four locations. Then,

even if any one is compromised, the key is not revealed. Unfortunately,

this scheme is not that helpful, because just knowing half of someone’s

private key is enough to drastically decrease the time needed for a

brute-force search to crack the remainder of the key.

Here’s another simple solution: create a one-time pad (a file of random

data) the same size as the private key, then XOR them together. Put the

result in one place and the pad in another, and destroy the original

key. We call each file a “share” or “shadow”. Later, you can XOR the two shares

together to retrieve the original key. Since the bytes of each file are

random, you can be assured that neither piece will reveal any

information about the key by itself. We can extend this scheme to *n* pieces by just creating *n* – 1 one-time pads, adding them together, and subtracting this from the key data to obtain the *n*th share. Unfortunately, if even one piece is lost, so is the key. This is called an (*n,n*) secret sharing scheme, because we have *n* shares and need *n* of them to reconstruct the secret.

Here’s another more useful scheme: let the private key be represented as the number *S*. Choose a random line in the plane passing through the point (0,* S*). Next, choose *n* random points on that line as your *n* shares. Now, if we know any two of the shares, we know the line, and so can find *S*. If we only know one share, then we just know one point, and for every *T* there is a line passing through (0, *T*) and that point; hence, the secret could be anything at all. This is called an (*n,2*) secret sharing scheme, because we have *n* shares and need 2 of them to reconstruct the secret.

Particularly useful for individuals is the (3,2) scheme – you can keep

one share on your person on a floppy or USB pen drive at all times, one

share on your hard drive, and one share backed up at a secure remote

location. When you need to use the key, you recreate it briefly using

the shares on your hard drive and your person, then destroy it. If you

lose the floppy or your hard disk crashes, you can recreate the key

from the remaining shares (and make yourself 3 new shares). If any one

share is compromised, the key is not compromised. If you know of a

compromise, you can recreate the key, create 3 new shares, and destroy

the old ones, preventing the attacker from getting a second share. The

new shares are unrelated to the old ones because a different random

line through (0,*S*) is used.

In 1979, Shamir generalized these ideas even further to create a general (*n*,*k*) secret sharing scheme for any *n*, *k*. The idea is to choose a random degree-*k* polynomial passing through (0,*S*), where *S* is the secret. Then choose *n* random points on this curve. Because there is exactly one degree-*k* polynomial passing through any given set of *k* points, any *k* of these points define the curve and allow us to retrieve the secret. If we have only *k* – 1 points, then we don’t learn anything about *S*, because there is a polynomial passing through (0,*T*) and those *k* – 1 points for every *T*.

To make the notion of choosing random polynomials and random points

precise, Shamir works with polynomials modulo a prime number *p*, choosing polynomial coefficients and point coordinates randomly from the integers in [0, *p* – 1].

Secret sharing has more applications than just protecting private keys.

It’s also useful in situations where we want to entrust a secret to a

group of people but want to ensure that several of them must cooperate

to access the secret. We don’t need to trust them all individually, nor

do they all need to cooperate, so the secret is still available if

members go missing or refuse to cooperate.

Finally, I wouldn’t just tell you all this without providing some

working code you can use. You can download an excellent implementation

of the scheme for UNIX called secsplit (direct download).

I don’t know of any free .NET implementation, but I’m sure that porting

this 600-line program into any environment wouldn’t be difficult.

Some more resources on secret sharing:

- Lectures notes from Cornell’s CS 513
- Wikipedia article
- Adi Shamir’s “How to Share a Secret“

- RSA FAQ

The simple RAID 5 algorithm will do (n-1) restore trick (http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5)

You may encrypt key and then split it on n-pieces (n >= 3) using “parity blocks”. You always can recover key using n-1 pieces.

Hi Alexander. You’re right that RAID 5 does split data in such a way that any n-1 of n bits allows recovery, and such that each set of k parallel bits stores k-1 bits of information. However, this is not a secure scheme; for example, if you know the contents of k-2 of the data disks, you know a (k-2)/(k-1) proportion of the file data, which even for 4 disks is 2/3 of it. The even distribution of the missing data in striping can make certain objects such as Unicode strings easy to extract from limited data. If the k disks were based on Shamir’s scheme, you would only have the same storage capacity as one disk, but any k-2 of the disks would be completely useless in determining the original data (which isn’t generally very nice, but might be handy for reusing disks that store confidential data).