Windows – Read me that virtual machine

After a couple of weeks of playing around with Hyper-V APIs for reading virtual machine screens and sending keystrokes – I hit upon an interesting idea.  What would it take to make a “virtual machine screen reader”?

You see, Windows itself has great support for a number of accessibility options.  And these work both in the host operating system environment – and inside the virtual machine when you are running Windows as a guest.  But what if you are not running Windows as a guest?  What if the guest OS is not actually running (e.g. BIOS screens, fatal errors, etc…)?

Well – with a little work I now have a sample script that will:

  1. Scrape the graphical content of a virtual machine screen
  2. Feed it into the Tesseract OCR library
  3. Feed the results of that into the Windows Speech Synthesis engine
  4. And read the screen to you

The results look like this:

< >

And the code needed to do this is as follows:

A couple of things to call out here:

  • To pull this off I am using the Tesseract Open Source OCR Engine and the PowerShell wrapper for it written by Jourdan Templeton
  • In order to get the best level of accuracy in OCR – I made two specific changes:
    • I stretch the VM screen bitmap before performing an OCR (I do not know why this matters – but it does make a difference)
    • I edited tesseractlib.psm1 from Jourdan’s wrapper to specify [Tesseract.EngineMode]::TesseractandCube instead of [Tesseract.EngineMode]::default.  This makes it slower – but more accurate
  • The sample above will capture the whole screen by default – and read it to you in a female voice.  There are a number of changes that you can make here:
    • If you specify a crop rectangle on line 4 – the script will only read a portion of the screen.
    • If you set $speakItToMe = $false on line 5 – the script will output text, instead of speaking.
    • If you change line 60 to $speak.SelectVoiceByHints(‘Male’) – you will get a male speaker instead.

Cheers,
Ben