Lee Schwartz is a Computational Linguist on the Microsoft Translator team. Today’s guest blog is about getting lost in (machine) translation…
Recently, a user seemed upset with the translation he received for a metal paint can. No wonder. When he translated this into Spanish, he got un metal pintura puede, which means a metal paint is able to. And, what is that supposed to mean? But, then again, what is "meaning" to a machine translation system anyway? Does anything mean anything? Or, is the computer just seeing words in combination in one language and corresponding words in another language? And is it assuming that because one sequence is used in the source language when another is used in the target, one is the translation of another? Even if the machine translation program is just seeing words in combination, wouldn’t it have seen paint can before and know that the can in this context is some kind of container? Then, again, can you be sure that the computer behind the MT program knows anything about paint cans, or has seen those two words in combination? Why do you think it would have? But, giving it the benefit of the doubt, and assuming it knows all about paint cans, or at least has seen the string paint can a lot, how is it supposed to know how to translate a metal paint can? Maybe the computer has seen something like The metal film on one side of the plate… may be obtained by …spraying a metal paint or ….
Ah ha! So there really are metal paints. And, if there are metal paints, why can’t a metal paint can be the answer to a metal paint can, can’t it? Well, it is just not likely that when you have the words paint and can in sequence, that can means be able to. But then again it is just not likely that can means anything but be able to. I guess we can say things and think things that are just not likely. I can easily understand what A metal paint can can, can’t it? means. The computer might just think that I inadvertently typed can twice. Certainly, if it learns from real data, say from the Web, it will see can can a lot. Maybe that is why it won’t translate He did the can can correctly. But really, what is English doing with so many types of cans anyway? We can even can worms, but we won’t open that one now.