Sshhh… it’s a secret (Matt Gertz)

Anthony D. Green [MSFT]

Back in seventh grade, I had a pretty heavy crush on a girl named Melissa.  She was a good friend of mine, and so I had no problems talking to her.  However, being painfully shy back then, I was far too nervous to ever ask her to “go with” me.  (For those not steeped in American traditions, to “go with” a girl or boy in junior high school meant that you were publicly asserting that the two of you were somehow romantically involved, whatever that actually meant when you were only twelve years old.  It was always unclear to me why the term “go with” was used, since, being seventh graders, you never actually went anywhere.  By the time you had access to a car and could actually go somewhere together, you were “dating,” not “going with.”  In practice, what “going with” really meant was that the girl felt obliged to call the boy on the phone on a daily basis, and if the boy didn’t muster up the courage to ask the girl out onto the floor in front of all of her friends at the next school dance, she would cry and break up with him.)

Anyway, I was able to reconcile my cowardice by writing notes to her expressing my affections.  Well, in code, anyway – I was somewhat insensitive as well as shy.  I used a simple 1:1 cipher where I mapped invented hieroglyphs to letters of the alphabet (essentially identical to writing a love letter using the WingDings font, though we didn’t have PCs back then).  This allowed me to feel good that I had done *something* to tell her how I felt about her, without actually putting me in emotional danger.   You see, I never gave her the solution key!

This worked fairly well for my ego for a couple of weeks, until I learned from a friend that she had deciphered at least half of my invented alphabet, using letter frequency to figure them out (it turned out that she liked puzzles of this sort).  Mortified, I immediately stopped sending her any new notes, and so, alas, she never learned how the depths of my feelings for her.

That junior high memory came back to me on Christmas Day, as I was reading through a book that my wife (apparently, I did overcome my shyness at some point) had given me as a gift.  It’s a fascinating book entitled “The Six Unsolved Ciphers” by Richard Belfield.  In it, Mr. Belfield walks the readers through such confounding puzzles as the Kryptos puzzle at the new CIA headquarters and the Shepherd’s monument.  The book really struck a chord with me.  As a kid, I was motivated to create all sorts of codes, alphabets, and languages, but in recent years, while I’ve certainly used cryptology – heck, we all do on a daily basis, whether we realize it or not – I hadn’t really done much with it other than to leverage functionality that was already there.  However, in reading Mr. Belfield ‘s description of the Vigenère encoding scheme, I realized that I had to try that one out, and I started working out the code.  Ultimately, I realized that this would make a fun blog topic (and for all I know, this has already been blogged somewhere, but I’ll press on nonetheless and see where we come out).

The Vigenère method

Before I jump into the coding, I’d better briefly describe the Vigenère method .  The method (and I’m borrowing heavily from Mr. Belfield here) involves both a key and an alphabet, and the encrypted letters result from a mathematical relation leveraging both of those.  The key can be any word – let’s say “MICROSOFT” – and the alphabet would be something like the letters A-Z — they might be in a different order known only to the sender and the receiver, but let’s assume for the purpose of this example that they are in normal English alphabetical order.  First thing I need to do is to create a grid with the alphabet offset by one each row (or column):

ABCDEFGHIJKLMNOPQRSTUVWXYZ

BCDEFGHIJKLMNOPQRSTUVWXYZA

CDEFGHIJKLMNOPQRSTUVWXYZAB

DEFGHIJKLMNOPQRSTUVWXYZABC

EFGHIJKLMNOPQRSTUVWXYZABCD

FGHIJKLMNOPQRSTUVWXYZABCDE

GHIJKLMNOPQRSTUVWXYZABCDEF

HIJKLMNOPQRSTUVWXYZABCDEFG

IJKLMNOPQRSTUVWXYZABCDEFGH

JKLMNOPQRSTUVWXYZABCDEFGHI

KLMNOPQRSTUVWXYZABCDEFGHIJ

LMNOPQRSTUVWXYZABCDEFGHIJK

MNOPQRSTUVWXYZABCDEFGHIJKL

NOPQRSTUVWXYZABCDEFGHIJKLM

OPQRSTUVWXYZABCDEFGHIJKLMN

PQRSTUVWXYZABCDEFGHIJKLMNO

QRSTUVWXYZABCDEFGHIJKLMNOP

RSTUVWXYZABCDEFGHIJKLMNOPQ

STUVWXYZABCDEFGHIJKLMNOPQR

TUVWXYZABCDEFGHIJKLMNOPQRS

UVWXYZABCDEFGHIJKLMNOPQRST

VWXYZABCDEFGHIJKLMNOPQRSTU

WXYZABCDEFGHIJKLMNOPQRSTUV

XYZABCDEFGHIJKLMNOPQRSTUVW

YZABCDEFGHIJKLMNOPQRSTUVWX

ZABCDEFGHIJKLMNOPQRSTUVWXY

 

So, let’s say I want to encode the phrase “ERIC KNOX SHOULD POST MORE BLOGS”.  The first letter in the keyword (MICROSOFT) is M, the first letter in the sentence is E, so I find the M row and the E column and note where they intersect – this gives us the letter Q.  The next letter in the keyword is I and the next letter in the sentence is R, and that yields Z.  We ignore spaces, and when we run out of keyword letters, we just start again at the beginning of the keyword.  This ultimately yields the encoded phrase “QZKTYFCCLTWWCRHCXMYWTVPDCLL”.

To decode the phrase, I work backwards – I find the Q in the M row, and ascend the column to find the E at the top, then find the Z in the I row and ascend to find the R at the top, and repeat this until I get back to “ERICKNOXSHOULDPOSTMOREBLOGS”.  It’s pretty simple, though tedious work for messages of any meaningful length when doing it by hand.  According to Mr. Belfield, this sort of code apparently was all but unbreakable for hundreds of years – nowadays, people have to change keywords at random times, throw in random junk in the text, or alter the alphabet ordering, to make it tougher to decode.

Creating a Vigenère encode/decode application

In coding up the Vigenère method, I knew that I’d want to make provision for code which used a non-standard alphabet (as does the Kryptos cipher that Mr. Belfield writes about) – for example, to make decoding trickier, I might make an alphabet that leverages an additional code word such as my last name:

GERTZABCDFHIJKLMNOPQSUVWXY

All 26 letters are still there; I’ve just pushed a few up to the front.  The advantage is that not knowing the alphabet ordering makes it tougher for others to crack my code.

The other thing I immediately decided was that creating a 2-dimensional array to hold the offsets of the alphabet would be incredibly inelegant as well as wasteful of memory.  Since the ordering of each row is deterministic, being offset by one each time, a simple index into the alphabet should suffice, wrapping around to the front of the alphabet when needed. (Think of the alphabet as being a circular list if that helps visualize what I mean).

I’ll start out the project as usual:  I’ll create a Visual Basic Windows Forms application (“VBEncrypt”).  Now, on the form, I’ll drag out labels and edit boxes for the Key, the Alphabet, the Clear Text, and the Encoded Text (the latter two boxes set to both “Multiline=True” and “Scroll bars=Vertical” in the properties).  I’ll also added some checkboxes to handle niceties – “Case-Insensitive” to allow the user to ignore casing, and “Group Letters When Encoding” (with an accompanying edit box) to cause spaces to be inserted every n characters when emitting the encrypted characters – i.e.,  “QZKT YFCC LTWW CRHC XMYW TVPD CLL” vs “QZKTYFCCLTWWCRHCXMYWTVPDCLL”.  Finally, I’ll add a couple of buttons to do the actual encoding or decoding.  My final layout is in a JPG in the ZIP file attached to this post.

The idea is that the user specifies the Key and the Alphabet (note that I’ll default to a normal alphabet in the interest of helping the user if they don’t want to type all 26 characters every time).  To encode, the user pastes text into the left pane and presses the “–>” button to encode it.  To decode, the user pastes encrypted text in the right pane and presses the “<–“ arrow.

Before we get to the guts of the code, I’m going to do a little housekeeping first.  I’m going to need to do validation on the Key, Alphabet, and the “Letters per group.”  For the Key, there are three issues – it must be non-empty, it must have no spaces (these are forbidden, since spaces are equivalent to NULLs and are used to simply make outputs more readable), and it must not use any characters not in the Alphabet.  The first two cases are easy to code:

    Private Function VerifyKey() As Boolean

        Dim Key As String = Me.edtKey.Text

 

        If String.IsNullOrEmpty(Key) Then

            MsgBox(“The Key contains no characters.”)

            Return False

        End If

 

        If Key.IndexOf(” “) <> -1 Then

            MsgBox(“The key contains one or more space characters.”)

            Return False

        End If

        Return True

    End Function

 

Whereas the third case (alphabet validation) is more complicated and, since I expect it to be rare, I’ll defer checking for it until I’m actually doing the encoding/decoding so as not to penalize the typical case performance-wise.  (Note , by the way,that I am hard-coding my MsgBox strings – this is just for blog clarity, and normally I’d put the strings into resources so that they could be localized into non-English languages.)

For the Alphabet verification, it’s nearly identical – I need the alphabet to be non-empty and to not contain spaces.  However, I also want to make sure that the Alphabet contains no duplicate characters, because that will make coding or decoding problematic.  To check for duplicate characters in a string, people usually sort the string using a selection sort and then walk through it looking for multiples step-by-step.  I’m lazy, though, and so I’m going to leverage the “Distinct()” method on the String class – this will do the sort for me and return an IEnumerable containing the unique characters in the string.  If its count of that IEnumerable isn’t identical to the length of the original string, then I know there’s a problem, and all it’ll cost me is just an IEnumerable object that I otherwise won’t use:

    Private Function VerifyAlphabet() As Boolean

        Dim Alphabet As String = Me.edtAlphabet.Text

 

        If String.IsNullOrEmpty(Alphabet) Then

            MsgBox(“The Alphabet contains no characters.”)

            Return False

        End If

 

        If Alphabet.IndexOf(” “) <> -1 Then

            MsgBox(“The alphabet contains a space character.”)

            Return False

        End If

 

        Dim distinctLetters As IEnumerable(Of Char) = Alphabet.Distinct()

        If distinctLetters.Count <> Alphabet.Length Then

            MsgBox(“The alphabet contains duplicates.”)

            Return False

        End If

 

        Return True

    End Function

 

For the GroupSize box, I want to enforce having only non-negative integers.  I’ll do this on the validation.  At the top of the editor, I’ll select the edtGroupSize control in the left dropdown, then in the right dropdown I’ll choose “Validating.”  I’ll then fill in the resulting code thusly:

    Private Sub edtGroupSize_Validating(ByVal sender As Object, _

ByVal e As System.ComponentModel.CancelEventArgs) Handles edtGroupSize.Validating

        If Not IsNumeric(edtGroupSize.Text.ToString) OrElse CInt(edtGroupSize.Text) < 0 _

OrElse edtGroupSize.Text.IndexOf(“.”) <> -1 Then

            ‘ Cancel the event and select the text to be corrected by the user.

            e.Cancel = True

            edtGroupSize.Select(0, edtGroupSize.Text.Length)

            MsgBox(“Only non-negative whole numbers are allowed in this field.”)

        End If

 

    End Sub

 

One last detail before we get to the meat of the code:  I need to link the availability of the GroupSize box to the state of the Group checkbox:

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e _

As System.EventArgs) Handles MyBase.Load

        edtGroupSize.Enabled = (chkGroup.CheckState = CheckState.Checked)

    End Sub

    Private Sub chkGroup_CheckedChanged(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles chkGroup.CheckedChanged

        edtGroupSize.Enabled = (chkGroup.CheckState = CheckState.Checked)

    End Sub

The Encrypting Code

We’ll now dig into the encoding.  By double-clicking on the Encode button, I’ll generate the boilerplate header:

    Private Sub btnEncode_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnEncode.Click

 

Next, I’ll call the Validation methods I set up in the previous session:

        If VerifyKey() AndAlso VerifyAlphabet() AndAlso Not String.IsNullOrEmpty(Me.edtClearText.Text) Then

 

I’m going to pull out all of my string values from the form a priori, since I know I’m going to be in a loop and I detest dereferencing controls or string lengths more than necessary.  Note that, when I pull in the text to encode (ClearText), I’m also stripping out any spaces from it using the Replace() method:

            Dim Alphabet As String = Me.edtAlphabet.Text

            Dim Key As String = Me.edtKey.Text

            Dim ClearText As String = Me.edtClearText.Text.Replace(” “, “”)

            Dim ClearTextLength As Integer = ClearText.Length

            Dim AlphabetLength As Integer = Alphabet.Length

            Dim KeyLength As Integer = Key.Length

 

The user might have selected “Case Insensitive,” and if that’s the case, I’ll upper-case everything in all of the strings to make the comparisons work:

            ‘ Push everything to uppercase if we are being case-insensitive:

            If chkCaseInsensitive.CheckState = CheckState.Checked Then

                Alphabet = Alphabet.ToUpper

                Key = Key.ToUpper

                ClearText = ClearText.ToUpper

            End If

 

Now, I need a place for the encrypted code to go.  I’ll  use a StringBuilder for this, because (as I’ve mentioned in earlier posts) it is a lot less wasteful than concatenating strings.  In fact, in this case, I already know the size of the text buffer I’ll need – the encoded text will be the same size as the clear text:

            Dim EncodedText As New StringBuilder(ClearTextLength)

 

As I mentioned above, I’m going to have to cycle through the Key, so I’ll need an index into it:

            Dim KeyIndex As Integer = 0

 

And I’ll proactively determine if the user wants me to add spaces between groups of letters:

            Dim GroupSize As Integer = -1

            If Me.chkGroup.CheckState = CheckState.Checked Then

                GroupSize = CInt(edtGroupSize.Text)

            End If

 

Now, we come to the place where the work gets done.  I will step through each letter of the ClearText and mathematically determine its corresponding encoded value.  The first thing to do is to determine the “row.”  I’ll get the current letter pointed to in the Key, and figure out where that letter lives in the Alphabet.  In a normal alphabet, I could simply calculate its offset from the letter “A”, but I mentioned earlier I want to support arbitrary alphabets and so I use “IndexOf()” instead.  (If IndexOf() returns a -1, then I know that the Key uses a letter not in the Alphabet, and will abort the operation with an appropriate message.)  The code is:

            For i As Integer = 0 To ClearTextLength – 1

                Dim RowOffset As Integer = Alphabet.IndexOf(Key(KeyIndex))

                If RowOffset = -1 Then

                    MsgBox(“Key contains character” & Key(KeyIndex) & “not in the alphabet.”)

                    Return

                End If

 

Knowing the row offset, I don’t need a 2-D array – I’ll just pretend that the Alphabet starts RowOffset characters later than it does, wrapping around if necessary – but I’ll get to that later. 

Now, let’s track down the column.  I’ll get that from the current ClearText letter itself, locating it within the Alphabet via IndexOf().  Again, I can report & abort if the text to encode contains characters that are not in the Alphabet:

 

                Dim ColOffset As Integer = Alphabet.IndexOf(ClearText(i))

                If ColOffset = -1 Then

                    MsgBox(“ClearText contains character” & ClearText(i) & _

 “not in the alphabet.”)

                    Return

                End If

 

Now, the encoding is easy. Moving along rows or along columns just changes the letter by one index on each step.  To select the “row” is identical to just advancing through the Alphabet by RowOffset characters and pretending that the Alphabet starts there.  I then add ColOffset to that value in order to account for the horizontal displacement, following the same logic.  Because the sum might “wrap around” back to the front of the Alphabet, I need to perform a modulus on the sum before using it as an index to Alphabet, so I won’t reference an illegal index:

                Dim EncodedChar As Char = Alphabet((RowOffset + ColOffset) Mod AlphabetLength)

                EncodedText.Append(EncodedChar)

 

It’s just that simple!  Now I’ll add any spaces if requested (again using a modulus to see if we’ve reached the n-1 point where a space needs to go), move to the next letter in the Key (again, modulus will help me wrap around to the beginning if needed), and throw the encoded letter into the buffer:

                ‘ Add spaces if requested:

                If GroupSize > 0 Then

                    If i Mod GroupSize = GroupSize – 1 Then

                        EncodedText.Append(” “)

                    End If

                End If

 

                ‘ On to the next character in the key…

                KeyIndex = (KeyIndex + 1) Mod KeyLength

            Next

 

After the final loop, I’ll update the text in the right-hand pane, and we’re done:

            Me.edtEncodedText.Text = EncodedText.ToString

 

        End If

    End Sub

 

Decoding is very similar to encoding, except that I use the EncodedText instead of the ClearText and, once I’ve found the “row” based on the current letter in the Key, I need to subtract the “column” instead of adding it – this means I also have to add the length of the alphabet to the result before taking a modulus, because modulus doesn’t work on negative numbers and I still need to account for wrap-around in the leftward direction.  The result looks like this:

    Private Sub btnDecode_Click(ByVal sender As System.Object, ByVal e _

As System.EventArgs) Handles btnDecode.Click

        If VerifyKey() AndAlso VerifyAlphabet() AndAlso _

Not String.IsNullOrEmpty(Me.edtEncodedText.Text) Then

 

            Dim Alphabet As String = Me.edtAlphabet.Text

            Dim Key As String = Me.edtKey.Text

            Dim EncodedText As String = Me.edtEncodedText.Text.Replace(” “, “”)

            Dim EncodedTextLength As Integer = EncodedText.Length

            Dim AlphabetLength As Integer = Alphabet.Length

            Dim KeyLength As Integer = Key.Length

 

            ‘ Push everything to uppercase if we are being case-insensitive:

            If chkCaseInsensitive.CheckState = CheckState.Checked Then

                Alphabet = Alphabet.ToUpper

                Key = Key.ToUpper

                EncodedText = EncodedText.ToUpper

            End If

 

            Dim ClearText As New StringBuilder(EncodedTextLength)

 

            Dim KeyIndex As Integer = 0

 

            For i As Integer = 0 To EncodedTextLength – 1

                Dim RowOffset As Integer = Alphabet.IndexOf(Key(KeyIndex))

                If RowOffset = -1 Then

                    MsgBox(“Key contains character” & Key(KeyIndex) & “not in the alphabet.”)

                    Return

                End If

 

                Dim ColOffset As Integer = Alphabet.IndexOf(EncodedText(i))

                If ColOffset = -1 Then

                    MsgBox(“ClearText contains character” & EncodedText(i) _

& “not in the alphabet.”)

                    Return

                End If

 

                Dim DecodedChar As Char = Alphabet((AlphabetLength + (ColOffset – RowOffset)) _

Mod AlphabetLength)

                ClearText.Append(DecodedChar)

 

                KeyIndex = (KeyIndex + 1) Mod KeyLength

            Next

            Me.edtClearText.Text = ClearText.ToString

 

        End If

 

    End Sub

 

And that’s it!  The final application is attached to this post. 

This was code I particularly enjoyed writing – I like the simple elegance of a good mathematical model. For me, though, the most poignant part of this whole exercise was remembering that APTNW WBOIA DLMJV RDATM QSOVU YJWLP ZJALN APZJL DNXRO VFUVA RQYOY ZFMAR CLDBZ XJJYY PBZQO IUWRN AOZHP NYIZW MSWMU YJWNO MQATD YRKUY RINRZ NDADP MMGIV BUPRQ.

‘Til next time,

  –Matt–*

VBEncrypt.zip

0 comments

Leave a comment

Feedback usabilla icon