As I mentioned in a previous post, I am working on rebuilding my foreign language vocabulary program to support a number of new features, with one of the key features being speech recognition. This poses a number of challenges, the primary challenge being we do not have recognition engines for several of the languages. I currently review vocabulary in German, French, Italian, Russian, Spanish, Arabic, and Thai. I am quite sure we do not have engines for Thai and Arabic and I suspect (though my be wrong) that we do not have them for Italian or Russian either.
In a simple form, the scenario is this. I enter a word in a given language and later take a test where I am asked to speak the translation into a microphone. Obviously, if I am doing a reverse test (German to English, Russian to English, etc) this is a simple exercise. But if the target language is one where we do not have an engine the exercise becomes quite complex.
Originally my idea was simply to allow myself to type in the closest phonemes for the word in a language that does have an engine. There are several problems with this. First, it eats up more time entering the words. Given that this whole exercise is intended to save me time, that is counterproductive. Second, I am not sure that I will get the phonemes right. This will probably lead to alot of false negatives and will require fine tuning the phonemes until recognition is successful.
My current thought is to use an algorithm similar to hashing. The idea is when I speak the word in the other language, it is recognized as some result in a different language. When I speak the same word again, it should "hash" to the same word. Of course, this may not be as accurate but I suspect for my uses it will be sufficient. This solution raises two new problems.
1) How do I determine the correct "hash" of the word?
2) What should the grammar look like?
I currently have no settled on a solution for either problem, though I have come up with a few ideas. Likely I will need to experiment on these ideas to determine what works best, if any of them.
The second problem actually seems a bit more straight forward to me. I currently have three ideas for this.
1) Create an extremely large grammar consisting of a wide range of matches possible in the language. Of course I would need an algorithm to create a reasonable set of matches but I can use linguistic knowledge of the target language to do this. I may also have to modify the grammar on the fly for new words that don't match, tying in a solution for the first problem. In this solution, every recognition would use this grammar.
2) Create a smaller grammar for each word that would contain the "hash" plus several incorrect variants of it. Therefore if the word is mispronounced it would be more likely to match one of the incorrect variants than the correct variant.
3) Create a grammar with the hashed values of all vocabulary words available. More likely this would be combined with solution #2.
To solve the first problem, I have thought of the following solutions.
1) This idea is basically the same as #1 above - creating a large grammar with the possible variants. As it is not feasible to have every phoneme variant available, I would need an algorithm that can extend the grammar when a new word does not match.
2) Create a phoneme match based on the text of the word. For some languages such as Italian this is relatively easy, but for others, such as Thai where spelling rules are complex and words do not always follow the rules, this is much more difficult. Currently I am leaning more towards the first solution.
There are other problems besides these two. Some languages such as Thai contain aspirated and unaspirated letters and the meaning of the word changes depending on which one is used. Chinese and Thai also have the tone issue - though we do have an engine for Chinese so I can possibly make use of its solution. I will blog more as I approach a solution but in the meantime comments are always welcome.