One more entry to the Phaser challenge

Since I keep receiving entries to my programming challenge, I’ve compiled a list. New entries will be added to that list as they are received.

Last Saturday (4/3/04) I received another C# solution from Frans Bouma. In his own words:

“Hi,

This week I started my phraser solution in C#, and today I finished it. As you had to make it as clean as possible I did, and it might look somewhat verbose through the comments (no not // add 1 comments ;)).

The core phraser routine is just 34 lines with half of them { and } or comments.

I paste the complete code below. As a wordlist, you should simply use a textfile and on each line 1 word.

When you compile the code, simply run it like:

phraser.exe 6423946369 words.txt 2

where 2 is the maximum amount of digits. You can also specify -1 which means no limit.

The algo is performing ok and can be made even faster with a more clever hashtable setup, as I've explained in the comments :)

Cheers, and thanks for this fun compo. :)”

using System;

using System.Collections;

using System.IO;

namespace Phraser

{

    /// <summary>

    /// Phrase creator class, which creates all phrases possible of a phone number, using a word list.

    /// The algorithm used: read all words in the wordlist and encode them into digit sequences (thus the

    /// digit sequence required to form the word using the phone keyboard). Then match the digit sequences

    /// with the phone number digits to find words in the phone number.

   /// This is more efficient than the other algorithm: calculate all permutations of possible character

    /// sequences which can be formed with the phone number.

    /// </summary>

    public class PhraseCreator

    {

        #region Class Member Declarations

        private Hashtable _word2DigitSequence;

        private string _phoneNumber, _wordListFilename;

        private int _maxAmountOfDigitsInPhrase;

        #endregion

        #region Constructors

        /// <summary>

        /// Constructor.

        /// </summary>

        /// <param name="phoneNumber">The phone number to transform into phrases</param>

        /// <param name="wordlistFilename">The filename with the english words to use in phrases.</param>

        /// <param name="maxAmountOfDigitsInPhrase">The maximum amount of digits to be placed in the phrase.

        /// -1 means unlimited.</param>

        public PhraseCreator(string phoneNumber, string wordListFilename, int maxAmountOfDigitsInPhrase)

        {

            _phoneNumber = phoneNumber;

            _wordListFilename = wordListFilename;

            _word2DigitSequence = new Hashtable();

            _maxAmountOfDigitsInPhrase = maxAmountOfDigitsInPhrase;

        }

        #endregion

        #region Public methods

        /// <summary>

        /// Initializes the class for the phrasing process.

        /// Loads wordlist and encodes all words read into digit sequences. Wordlists have to have 1 word per

        /// line. Skips all words which are longer than the specified phonenumber.

        /// </summary>

        public void Initialize()

        {

            StreamReader reader = null;

            string wordFromFile;

            try

            {

                reader = new StreamReader(_wordListFilename);

             // loop through the lines in the file. Each word is on a single line. Strip whitespace

                while((wordFromFile = reader.ReadLine()) != null)

                {

                    wordFromFile = wordFromFile.Trim();

                    if((wordFromFile.Length==0) || (wordFromFile.Length>_phoneNumber.Length))

                    {

                        // not valid.

                        continue;

                    }

                    // line read, each line contains 1 word. Create digit sequence for word read and store it in

                    // the hashtable.

                    wordFromFile = wordFromFile.ToLower();

                    // transform it to a digit sequence so we can use it for easy matching later.

                string wordAsDigits = DigitizeWord(wordFromFile);

                    // store both in hashtable

                    if(!_word2DigitSequence.ContainsKey(wordFromFile))

                    {

                        _word2DigitSequence.Add(wordFromFile, wordAsDigits);

                    }

                }

            }

            finally

            {

                reader.Close();

            }

        }

        /// <summary>

        /// Starts the phrasing process by calling the routine which dives into the recursion.

        /// </summary>

        public void StartPhrasingProcess()

        {

            DoCreatePhrases(0, "", 0);

        }

        #endregion

        #region Private methods

        /// <summary>

        /// Creates all the possible phrases for the phone number passed in in the constructor.

        /// It uses the read words in the file which filename was passed in in the constructor.

        /// </summary>

        /// <param name="startIndex">the current index to start scanning in the phonenumber</param>

        /// <param name="phraseInProgress">the phrase which is currently being build</param>

        /// <param name="currentAmountOfDigits">the current amount of digits in the phraseInProgress</param>

        /// <remarks>

    /// It uses a simple algorithm. It walks all words read and uses their digit sequence representations to

        /// find them back in the phone number passed in starting with the first digit, and so on. If the current

        /// digit in the phone number is not usable in any word, it is kept as a digit in the phrase. After the

        /// complete wordlist has been processed for the current index, the digit is kept and the current index is

        /// increased with 1. Until the end of the phone number is reached and recursion stops and back tracks. The

        /// amount of digits is limited by the amount passed to the constructor. A '-' is placed between words.

        /// <br><br>

        /// It will write all found phrases to the console. Although the algorithm is very fast, it can be sped up

        /// even more by constructing a hashtable with per digit (0-9) a list of words which digit sequence representations

        /// start with that digit. This would greatly reduce the amount of words to loop through for each digit position.

        /// </remarks>

        private void DoCreatePhrases(int startIndex, string phraseInProgress, int currentAmountOfDigits)

        {

            if(startIndex >= _phoneNumber.Length)

            {

         // we have scanned the complete phonenumber through recursion. We can exit now by printing the phrase

                // we've found to the console.

                Console.WriteLine("Phrase found: {0}", phraseInProgress);

                return;

  }

            string newPhraseInProgress = String.Empty;

            foreach(DictionaryEntry wordDigitSequencePair in _word2DigitSequence)

            {

                string wordAsDigitSequence = (string)wordDigitSequencePair.Value;

        // check if word matches the digits starting on startindex in phonenumber. If so, we have found a word.

                if((startIndex + wordAsDigitSequence.Length) > _phoneNumber.Length)

                {

                    // word can never match as it is longer than the remaining digits.

                    continue;

                }

                if(_phoneNumber.Substring(startIndex, wordAsDigitSequence.Length) == wordAsDigitSequence)

                {

                    // Match found. Dive into recursion with this new phraseInProgress

                    DoCreatePhrases(startIndex + wordAsDigitSequence.Length, MakeNewCurrentPhrase(phraseInProgress,

                        wordDigitSequencePair.Key.ToString()), currentAmountOfDigits);

   }

            }

            if((_maxAmountOfDigitsInPhrase == -1) || (_maxAmountOfDigitsInPhrase > currentAmountOfDigits))

            {

                // simply use the digit as a 'word' in the phrase and continue with the next digit. Then dive into recursion

                // with this new phraseInProgress

                DoCreatePhrases(startIndex+1, MakeNewCurrentPhrase(phraseInProgress, _phoneNumber[startIndex].ToString()),

                    currentAmountOfDigits+1);

            }

  }

        /// <summary>

        /// Creates a new phrase based on the two inputs. If the current phrase is empty, no '-' is emitted.

        /// </summary>

        /// <param name="currentPhrase">the current phrase we're working on</param>

      /// <param name="wordToAdd">the new word to add to the current phrase</param>

        /// <returns>new current phrase</returns>

        private string MakeNewCurrentPhrase(string currentPhrase, string wordToAdd)

        {

            if(currentPhrase.Length > 0)

            {

                return currentPhrase + "-" + wordToAdd;

            }

            else

            {

                return wordToAdd;

            }

        }

        /// <summary>

        /// Transforms the passed in word into a sequence of digits, the same sequence which is required to press

        /// on a phone keyboard to form the word.

        /// </summary>

        /// <param name="word">The word to transform into digits</param>

        /// <returns>The word as a string of digits</returns>

        /// <example>"nice" will be transformed into: "6423"</example>

        private string DigitizeWord(string word)

        {

            // get word as characters

            char[] wordAsCharacters = word.ToCharArray();

            char[] wordAsDigitSequence = new char[wordAsCharacters.Length];

            // digitize the word using CharacterToDigit

            for (int i = 0; i < wordAsCharacters.Length; i++)

            {

                wordAsDigitSequence[i] = CharacterToDigit(wordAsCharacters[i]);

            }

            // create a new string from the digits which are stored as characters.

            return new string(wordAsDigitSequence);

        }

        /// <summary>

        /// Transforms the passed in character, which can be a member of the range a-z, to a digit, which can be

        /// 2-9 (as only the keys 2-9 have characters on them on a phone keyboard).

        /// </summary>

        /// <param name="character">The character to transform into a digit</param>

   /// <returns>The digit representation of the passed in character.</returns>

        /// <example>'n' will be transformed into '6'</example>

        private char CharacterToDigit(char character)

        {

            // set variable to return to a default: 2.

            char digitToReturn = '2';

            // do the transformation.

            switch(character)

            {

                case 'a':

                case 'b':

                case 'c':

                    digitToReturn = '2';

       break;

                case 'd':

                case 'e':

                case 'f':

                    digitToReturn = '3';

                    break;

                case 'g':

                case 'h':

                case 'i':

            digitToReturn = '4';

                    break;

                case 'j':

                case 'k':

                case 'l':

                    digitToReturn = '5';

                    break;

                case 'm':

                case 'n':

  case 'o':

                    digitToReturn = '6';

                    break;

                case 'p':

                case 'q':

                case 'r':

                case 's':

                    digitToReturn = '7';

                   break;

                case 't':

                case 'u':

                case 'v':

                    digitToReturn = '8';

                    break;

                case 'w':

                case 'x':

                case 'y':

                case 'z':

                    digitToReturn = '9';

                    break;

            }

            return digitToReturn;

        }

        #endregion

    }

    /// <summary>

    /// Startup class for the application.

    /// </summary>

    public class Startup

    {

        /// <summary>

        /// The main entry point for the application.

        /// Expects 3 parameters:

        /// parameter 1: phone number to phrase.

        /// parameter 2: filename with English words.

        /// parameter 3: max amount of digits to place in the phrase. can be -1, which means 0 maximum set.

        /// </summary>

        [STAThread]

        static void Main(string[] args)

        {

            // check input

            if(args.Length < 3)

            {

           // wrong amount of arguments

                Console.WriteLine("Usage: Phraser Phonenumber Filename MaxAmountOfDigits" + Environment.NewLine +

                    "where 'Filename' is the file with words to use" + Environment.NewLine +

             "where 'MaxAmountOfDigits is the maximum amount of digits to end up in the phrase. Can be -1 (no limit)");

                return;

            }

            try

            {

                // 3 arguments specified. Create the phrases. Use the PhraseCreator class to do that.

                PhraseCreator thePhraser = new PhraseCreator(args[0], args[1], Convert.ToInt32(args[2]));

                thePhraser.Initialize();

                // perform the phrasing

                thePhraser.StartPhrasingProcess();

            }

            catch(Exception ex)

            {

                // display the exception in the console.

                Console.WriteLine("Exception caught: {0}", ex.Message);

                Console.WriteLine("Source: {0}", ex.Source);

                Console.WriteLine("Stacktrace: {0}", ex.StackTrace);

            }

        }

    }

}