Programming Challenge: Phraser


Want to prove that you are the best programmer money can buy?  (OK, I know you are not for sale, but your boss may need a friendly reminder that it’s time for your next big raise.)  Here’s your chance:


 


On a telephone keypad, the number keys 2 — 9 also carry letters on them (see the picture below).  Your task is to write a program that finds all possible ways to phrase a phone number.  By “phrase” I mean using a mixture of numbers and English words to spell out the number.  For example, the number 642-394-6369 can be phrased as nice-window, nice-wind-ox, nice-win-fox, nice-9-in-fox, etc.


 



 


Your input includes a list of English words, as well as the phone number as a sequence of digits.  For testing your program, you can use a basic English word list I found here.


 


The merit of an entry will be judged mainly by how clear the code is, not necessarily the efficiency.  You can choose your favorite programming language, as long as it gets the job done (If it’s not a widely known language, you probably want to make sure the code is self-documenting, so other people can understand why your code is so cool).


 


I will post my own solution in several days.  (In case you are wondering, it is not in C++.)


 


Enjoy!

Comments (44)

  1. Joe Lisp User says:

    This is pretty similar to the programming problem that was used to compare Java to C/C++ (and later to Lisp) a few years ago:

    http://www.norvig.com/java-lisp.html

  2. Anonymous says:

    I didn’t see you mentioning payment or a prize? Do you expect someone doing your job for free? 🙂

  3. the1 says:

    Dear Joe,

    Thanks for the pointer! I wasn’t aware that there was a similar contest. But we shouldn’t let that take away the fun, should we? Let’s try to beat Norvig!

    Dear Anonymous,

    By winning the contest, you gain unlimited bragging right. Didn’t I also mention your boss’s reaction? Come on, there is no excuse for not doing it. 🙂

  4. senkwe says:

    Hey, isn’t including numbers in the phrase a cop out? eg "nice-9-in-fox". Shoudn’t there be a few constraints on how many numbers are allowed in the phrase?

  5. the1 says:

    Dear Senkwe,

    Let’s keep it simple. For now I don’t care how many numbers you use in the phrase. The idea is to get ALL different ways to phrase your phone number. Additional constraints (e.g. 3 digits allowed at most) can aways be added later to filter the result. The principle of modular programming says we should do only ONE thing (and do it well) in one module.

  6. B.Y. says:

    OK, I have a question. When you say "how clear the code is", does the word code mean code only, or code and comment ? If I have the following two entries:

    *entry A: simple code without comment.

    *entry B: slightly more complex code than A, but it has a ton of comments to make it easier to understand.

    which entry is considered better in this contest ?

  7. the1 says:

    Dear B.Y.,

    You are really serious about this. 🙂 This is a contest for clear code. While good comments are important, they are not the focus. (Actually often I don’t trust comments, because I’ve seen so many cases when they are out of sync with what the code really does — programmers always update the code and forget to change the comments accordingly).

    Good code should be clear on itself and require few comments. Bad comments are worse than no comments at all. I hate people writing comments just for comments’ sake, e.g.

    a++; // increase the value of a by 1

    In your example, A will be preferred — as long as it can be easily understood without comment.

    Finally, I don’t want to pretend being an authority on code clarity. There is no judge for this contest. 🙂 The whole purpose of this chanllenge is to have some fun and learn from each others’ different angles of solving problems. (OK, I lied about the big raise, but if we learn something from this, it can be applied to get us closer to the raise.)

  8. Here are some thoughts on comments: http://weblogs.asp.net/jaybaz_ms/archive/2004/03/22/94066.aspx

    If your code can’t be understood without comments, then either add them or simplify. (Doing neither is bad!)

  9. CalvinH [MS] says:

    My boss forwarded me this blog<g>

    I have a 100 line solution (with comments) not written in C++. It uses a 53000 word dictionary and does not have much optimization, although I have an optimized version too.

    How should I submit this entry? Should I just post it here?

    thanks

  10. the1 says:

    Dear Calvin,

    Wow, the first entry! Congratulations!

    You can either post the code here, or put it on your own blog (if you have one or want to create one) and provide a link.

    If you post it here, the format and indentation will be lost. 🙁 Let’s see how it looks, and if necessary, you can email me (Zhanyong Wan) the code and I’ll post it in its original format.

    Also, can you provide a link to your dictionary so others can also use it? Thanks!

  11. CalvinH [MS] says:

    Here’s a link to the programming challenge solution on my blog. It also includes a link to the dictionary I used.

    http://blogs.msdn.com/vsdata/archive/2004/03/31/105159.aspx

  12. I made some changes to the algorithm by cutting out single character responses. I don’t bother placing the dashes in since I don’t think they are uber important, but I could put that feature back in if necessary. The code is hosted on.

    http://weblogs.asp.net/justin_rogers/articles/105916.aspx

    I’m using the cheapo dictionary for now, and I’ve done 0 optimizations. The only true optimization is in the switch table.

    One thing to note. You can really optimize the algorithm in the case that a 1 or 2 somehow bisects the numbers. When you do this, you only have to process two substrings, reinserting the splitting character where necessary. This can result in a huge performance gain. However, a good deal of the numbers you would process don’t contain a 1 or a 0, and so the perf gain is lost.

  13. And my algorithm isn’t properly convergent either, so I’m not yet completed.

  14. Nick Berardi says:

    "Nice" isn’t in the dictionary list that you provided.

  15. Nick Berardi says:

    neither is "in", "win", or "fox". just to let everybody know for testing purposes.

  16. Nick Berardi says:

    Here is my submission for the contest. You can find the code at http://blog.zigamorph.com/Default.aspx/archive/2004/04/07/158.aspx

    In addition you just need to take the word is in this blog and add "nice", "in", "win", and "fox". Then name it "Dictionary.txt" and put it in the same directory as your program.

    Have fun and if you find any bugs please submit it to my blog for the code.

    For all of you that want to know how long this too, it took about 3 hours to wrap my mind around the problem and code it. So please forgive spelling errors.

    Thanks, Nick

  17. Nick Berardi says:

    Oh one more thing. In you original post you mentioned that there was 160 solutions for the number 6423946369 but my program returns 228 possible solutions. I have verified all the data am I missing something in my program? Anybody see an error, or was there an error with the original dataset?

  18. the1 says:

    Nick,

    Your result list is bigger because your word list contains "ox", which mine doesn’t have.

  19. Nick Berardi says:

    Ah..

    That makes more sense, I had to add that to get one of the 4 results you had listed above.

  20. Nick Berardi says:

    Hey Mike,

    Nice easy solution to look at. But what dictionary did you use. Because the one I got off the site above doesn’t seem to work. Well at least there seems to be a loop that never ends?

  21. Brian C Robinson says:

    Ok, here’s my beautifully innefficient but correct and clear solution:

    http://www.herassmygod.org/bcr19374/programming/Phraser/Phraser.py

    Tests available here:

    http://www.herassmygod.org/bcr19374/programming/Phraser/PhraserTest.py

    And an example of using the class (assuming words.txt in the current directory is a text file with one word per line, as the example stated in the problem):

    http://www.herassmygod.org/bcr19374/programming/Phraser/PhraserRun.py