Phone Number Word Games part 2 (by CalvinH)


As I said before in this blog, I have an optimized version of a program to solve a programming challenge


To recap, the challenge was to enumerate the possible phrases that can be obtained from a phone number, like 625-329-4741 is MakeAWish1


 


I understood the original challenge to be to solve the problem, without regard to efficiency. So, in a couple hours, I came up with a solution that wasn’t very efficient that I posted last week. Subsequently, I polished the optimized version below.


 


You can try this algorithm out by visiting this site and entering any phone number or string of letters. (Sorry, this site is internal to Microsoft only: I don’t want to bog down my tired old server, described in this blog)


 


It takes up to 15 seconds to calculate phrases based on a phone keypad  and look them up in a 50,000 word dictionary.


 


For the original phone number of “642-394-6369” there were 20101 results calculated in about 10 seconds.


 


 


&& There are 3 parts to this algorithm.


&& First (GetDigSeqs) all the possible unique digit subsequences are enumerated into DigSeq.


&& For 123, this would be (123), (23), (3). For N digits, it would be maximum N sequences.


&& Part 2: (EnumNum) for each DigSeq, enumerate all possible letter words of length from 1 to len(DigSeq),


&& look them up in the dictionary, and place unique ones in SubWords


&& Part 3: (FormPhrase) For each digseq and, subword, assemble an N length phrase


&& EnumNum This part uses 2 parallel strings: the number to permute and a pattern


&& cPattern governs the pattern of letters to use for each digit’s place


&& a “0” for each digits place to indicate which letter of the digit it is (0,1,2,3). 0 indicates use the raw digit


&& At the end, it’ll be “1” (for 0,1), “3” (for 2,3,4,5,6,8), or “4” for “7,9”, like “3334433143”


 


CLEAR


cNumber=INPUTBOX(“Phone #?”,”Phone #”,”642-394-6369″)          && number of patterns is 4^8*5^2 = 1638400


dtStart=SECONDS()


GetPhrases(STRTRAN(cNumber,”-“,””))          && remove dashes


dtEnd=SECONDS()


?”#secs= “,dtEnd-dtStart,”num=”,RECCOUNT(“phrases”)


 


#define MAXDIGITLEN         2          && max seq length of digits in final result


PROCEDURE GetPhrases(cNumber as String)            && Given a digit sequence(“6423946369”) get all phrases


            PUBLIC NUMDIGS,ox as dictionary.dict


            NUMDIGS = LEN(cNumber)


            ox=CREATEOBJECT(‘dictionary.dict’)            && Instantiate the dictionary COM object


            ox.DictNum=2 && 1 = 171000 word dictionary, 2= 53869 word


            CREATE CURSOR DigSeqs (DigSeq char(NUMDIGS+1))    && all Digit subsequences


            INDEX ON DigSeq TAG DigSeq


            CREATE CURSOR phrases (phrase char(30))                   && a table into which we can put all resulting phrases


            CREATE CURSOR SubWords (SubDigits char(NUMDIGS+1),SubWord char(NUMDIGS+1))        && All subwords found in dictionary


            INDEX ON SubDigits TAG SubDigits


            INDEX ON SubWord TAG SubWord


            SET ORDER TO 1


            SELECT SubWords


            GetDigSeqs(cNumber)


            FormPhrase(cNumber,””)


 


PROCEDURE GetDigSeqs(cNumber as String)           && cNumber is all digis. Find the subsequences.


            LOCAL i,fShorterFound


            IF !SEEK(cNumber+’ ‘,”DigSeqs”)                   && if the number is not in the table


                        fShorterFound= SEEK(LEFT(cNumber,LEN(cNumber)-1)+’ ‘)           && Chop off 1 digit at the end and see if that’s found


                        INSERT INTO DigSeqs VALUES (cNumber)                                                 && add the number into the table


                        IF !fShorterFound                                                                    && if the shorter was not found,


                                    EnumNum(cNumber,REPLICATE(“0”,LEN(cNumber)),1)       && enumerate all words, look in dictionary


                        ENDIF


            ENDIF


            IF LEN(cNumber) >1              && If we need to recur


                        GetDigSeqs(SUBSTR(cNumber,2))     && Recur with the string without the first digit


            ENDIF


            RETURN


 


PROCEDURE EnumNum(cNumber as String,cPattern as String, nDigNum as Integer) && cNumber is raw string of digits with no separators


            LOCAL i,cDigit,cPat,cStartLet,cLastPat,cNewLet


            cDigit=SUBSTR(cNumber,nDigNum,1)           && the digit we’re working on


            cPat=SUBSTR(cPattern,nDigNum,1)               && where are we on this digit?


            cStartLet=SUBSTR(“  adgjmptw”,ASC(cDigit)-47,1)   &&   Starting letter for this digit: 0  1  2abc, 3def, 4ghi, 5jkl, 6mno, 7pqrs, 8tuv, 9wxyz


            cLastpat =SUBSTR(“0033333434”,ASC(cDigit)-47,1)            && # permutations -1 for this digit: 0  0    4,    4,    4,    4,    4,    5,     4,    5


            DO WHILE .t.


                        IF !”0″$LEFT(cPattern,nDigNum)


                                    IF !SEEK(LEFT(cNumber,nDigNum)+’ ‘,”SubWords”,2) and ox.isword(LEFT(cNumber,nDigNum))


                                                INSERT INTO SubWords (SubDigits,SubWord) VALUES (LEFT(DigSeqs.DigSeq,nDigNum),LEFT(cNumber,nDigNum))


                                    ENDIF


                                    IF nDigNum < LEN(cNumber) && we haven’t enumerated all digits yet


                                                EnumNum(cNumber,cPattern,nDigNum+1)      && recur with next digit


                                    ENDIF


                        ENDIF


                        IF cPat=cLastPat && this digit has reached the end (for ‘7’, we’ve done “7”,”p”,”q”,”r”,”s”)


                                    EXIT


                        ENDIF


                        cPat = CHR(ASC(cPat)+1)                 && increment the pattern


                        cPattern=LEFT(cPattern,nDigNum-1) + cPat + SUBSTR(cPattern, nDigNum+1)         && insert the new letter into the pattern


                        cNewlet = CHR(ASC(cStartLet)+ASC(cPat)-49)        && from ‘a’ to ‘b’ or from ‘n’ to ‘o’


                        cNumber=LEFT(cNumber,nDigNum-1) + cNewLet + SUBSTR(cNumber, nDigNum+1)        && insert the new letter into the number


            ENDDO


            RETURN


 


PROCEDURE FormPhrase(cNumber as String,cPartResult as String)   && cNumber is digit only sequence


            LOCAL i,nLen,nRec,cPartDigit


            nLen = LEN(cNumber)


            IF nLen = 0


                        INSERT INTO phrases VALUES (SUBSTR(cPartResult,2))   && if we’ve used all the digits, then  we have a phrase. Remove the initial ‘-‘


            ELSE


                        FOR i = 1 TO nLen


                                    cPartDigit=LEFT(cNumber,i)   && the first i digits of the number


                                    IF i <=MAXDIGITLEN AND LEN(cPartResult)>0 AND ISALPHA(RIGHT(cPartResult,1))       && don’t insert 2 digit sequences adjacent


                                                FormPhrase(SUBSTR(cNumber,i+1), cPartResult+’-‘+ cPartDigit)       && insert the digits as a part result


                                    ENDIF


                                    IF LEFT(cPartDigit,1)$”01″ AND i <= MAXDIGITLEN && if it starts with ‘0’ or ‘1’, it’s ok


                                                FormPhrase(SUBSTR(cNumber,i+1), cPartResult+’-‘+ cPartDigit)


                                    ELSE


                                                SEEK cPartDigit+’ ‘      && now find any subword associated with this digit sequence


                                                SCAN WHILE SubDigits=cPartDigit+’ ‘           && for each subwords found (364 can be “dog” or “fog”)


                                                            nRec=RECNO()                                  && save/restore the current record


                                                            FormPhrase(SUBSTR(cNumber,i+1),cPartResult+’-‘+ TRIM(SubWords.SubWord))  && recur with those subwords as partial results


                                                            GO nRec


                                                ENDSCAN


                                    ENDIF


                        ENDFOR


            ENDIF


RETURN


 


End of program


 


 


 


 

Comments (6)

  1. The link for me to try this out is broken. Would you tell me what it is?

    mlprobka@imfpubs.com

  2. CalvinH [MS] says:

    I just posted a new blog entry with a new link

    http://weblogs.asp.net/calvin_hsia/archive/2004/08/03/207286.aspx

  3. Since July 4 th is nearing, I thought it would be appropriate to start my independent blog. In this first