Text to Speech in Mission Impossible 3: A Dissection

Besides being the best of the three MI movies, there were 2 instances of TTS in the movie that deserve some discussion (and clarification). One of the scenes was simple and plausable, while the second was a definite stretch (i.e., not doable by today’s technology).

In the first scene with TTS, one of the “good” guys was automating the descruction of a wharehouse full of “bad” guys, using vehicles equipped with large guns. When the automation started, the computer began speaking out some information using TTS. I’m pretty sure it was Mac OSX TTS. Definitely low on the naturalness scale, but intelligible nonetheless. (Can anyone confirm which TTS voice this was?)

In the second scene(s), THE “good” guy (i.e., Tom Cruise’s character), forces THE bad guy to read several syntactically but not semantically grammatical sentences off of a business sized card at gun point. Within seconds of completing the reading of the card, another “good” guy has intercepted the wave beneath the complex to generate a highly natural and intelligible TTS voice which is sent back to our protagonist in a bathroom who then can talk with the “bad” guy’s voice.  OK, so I’m actually quite forgiving in movies, giving the technology the benefit of the doubt (i.e., I pretend that I’m watching Sci-Fi and not a modern day action movie). So, if we assume that this was some other technology beyond TTS, great. No worries. However, if you are insisting that the movie follow current plausable technology, then here’s what wrong with the TTS in this second scene:

1) The TTS engine was generated from several sentences. Today, takes many many hours of recordings to generate a naturally sounding engine.

2) The recording was done in a bathroom next to a loud party and then streamed to a nearby underground location. Not likely to result in the high quality recordings that one would need for TTS.

3) The recording was streamed through rock. I’m imagining that some signal loss would be encountered in real life.

4) The resulting TTS sounded almost EXACTLY (egads, as if it truly was the other actor speaking with Tom lip-synching) like the “Bad” guy!  Even on the BEST concatenative engines (i.e., based on 40+ hours of recording a person’s voice), it won’t sound just like the real person.

Comments? Alternative takes?

Comments (9)

  1. Rosyna says:

    Uhm, the latter part wasn’t a TTS engine at all. It was frequency/pitch matching engine dealie. He had him read a card full of consonant, vowel, and combination sounds. This audio was sent (I imagine digitally) to Luther, Luther’s many computers compiled a modulation algorithm based on Tom Cruise’s voice and sent the finished algorithm back to the chip inside his neck. This program just adjusted frequency and what-not. It did not actually have to do any Text To Speech. It turned the sound of one voice into the sound of another. No biggy. it’s like how you can make Male speech sound like female speech in an audio program.

    However, the timing of each syllable for cruise and what’s his face would have to be identical for this to be convincing.

  2. jaywaltm says:

    Yes, you are right. I mingle my worlds of speech synthesis with text to speech. Technically, it was speech synthesis and not text to speech.

  3. speech to text says:

    <a href="https://www.speechtotextservice.com"&gt; speech to text </a>