Code For Decoding Codes – A Programming Project Discussion

I love to collect fun coding projects. I like short, easy to understand projects that get students to use a number of things (language constructs, programming concepts) in interesting ways that take advantage of what computers do best – simple but tedious tasks. If the task helps students to internalize some basic foundational understanding of how computing works that’s all the best. I look for projects that give we things to discuss with the students.

One of my favorite early coding projects was counting letters. I did this first in my very first programming course taken about 35 years ago. It was written in FORTRAN of course and read the data in from punch cards. It was the basic read in some data text and report back how many times each letter was used. For bonus points count words and sentences. Oh boy! Lots of fun. Well for someone who was (and still is) very excited about making computers do things it was fun. But useful? Not so much.

Recently though I found an expansion of that simple project that I think has a higher, wider level of interest. Lots of kids play with codes; usually simple substitutions or letter mapping. Offsetting a letter by some distance (a’s become c’s for example) is pretty common. Back in the old days of the Internet we used something called ROT-13 to hide spoilers and “adult language.” So how can we use that to make a project that hopefully will seem relevant and interesting to high school students? I think I found the answer.

This project description comes for the informational page for the CTCSTA Programming Contest and held at St. Mary-St. Joseph School (Willimantic, CT) later this month. As an aside, if you are in that area it looks like they have room for more teams.

Any one-to-one mapping, f, of any alphabet to itself can be used to encode text by replacing each occurrence of any letter, c, with f(c). One such mapping could be the mapping of a letter to three positions beyond the letter in the alphabet. That is, a --d, b --e, c-- f, d --g and so on. With this mapping, “The car is blue.” Will be encoded as “Wkh fdu lv eoxh.” Now suppose we do not know by how much the characters have been shifted. One possibility is to scan the text and build a frequency table (a histogram) of all the alphabetic characters. For the purposes of building the frequency table, we shall consider lower case and upper case characters to be equivalent. After building the table, we can find out the most frequently occurring character. From analysis of large numbers of English texts, it has been found that the character ‘e’ is the most frequently occurring character.

For example, suppose we find the character ‘g’ to be the most frequent character in the given encoded text. Then we conclude that the document was shifted by 2.

Write a program that decodes the content of the given input and outputs the decoded text.

There are lots of ways to identify and count letters of course. The brut force way is to have 26 (or more if you haven’t discovered how to upper case or lower case text before comparing) IF statements that do text compares. Yuck! Boring and tedious. So most people take advantage of the fact that text is stored as an ASCII value and that one can do mathematical operations on a character. This mean a programmer can make a more general purpose compare statement and use a numeric value as an index to an array. Cool – we’ve got arrays and ASCII in the project now. What next?

Ah, bounds checking! What happens when one adds to a “z” or subtracts from an “a” while doing our rotation? So now we have bounds/range checking and the introduction to a circular list. Even better. This is a step up from a basic letter counting routine. Plus for fun we are solving a puzzle (how much is the data rotated?) and we are decoding a coded message. Counting letters with a purpose!

Now if we want to get all “mathy” (as Steven Colbert might say just to add a cultural reference to my post) we can also ask students to report all sorts of statistics on the letters they count. Once they realize how easily they can do all sorts of other things inside a loop this becomes easy and they can amaze themselves with how much they can do in how little time.

We can also have some other interesting discussions though. What do we do if two or more letters are tied for the most occurrences? Do we translate several ways and let a person pick the right on? Or can we automate the process and check words against a word database to see which translation works?

And what about program design? Can we create standard routines for rotating letters that we can use and reuse for both coding and decoding message? How much of the code that we create belongs in small easy to test routines? Are their features in the language or libraries that make our life more easy? Or do we have to invent a lot of new code? If students know multiple languages is this program easier or harder in specific languages? (Think about different ways to convert from a character to a number value and breaking up strings into characters for example.)