After a couple of weeks of playing around with Hyper-V APIs for reading virtual machine screens and sending keystrokes – I hit upon an interesting idea. What would it take to make a “virtual machine screen reader”?
You see, Windows itself has great support for a number of accessibility options. And these work both in the host operating system environment – and inside the virtual machine when you are running Windows as a guest. But what if you are not running Windows as a guest? What if the guest OS is not actually running (e.g. BIOS screens, fatal errors, etc…)?
Well – with a little work I now have a sample script that will:
- Scrape the graphical content of a virtual machine screen
- Feed it into the Tesseract OCR library
- Feed the results of that into the Windows Speech Synthesis engine
- And read the screen to you
The results look like this:
And the code needed to do this is as follows:
A couple of things to call out here:
- To pull this off I am using the Tesseract Open Source OCR Engine and the PowerShell wrapper for it written by Jourdan Templeton
- In order to get the best level of accuracy in OCR – I made two specific changes:
- I stretch the VM screen bitmap before performing an OCR (I do not know why this matters – but it does make a difference)
- I edited tesseractlib.psm1 from Jourdan’s wrapper to specify [Tesseract.EngineMode]::TesseractandCube instead of [Tesseract.EngineMode]::default. This makes it slower – but more accurate
- The sample above will capture the whole screen by default – and read it to you in a female voice. There are a number of changes that you can make here:
- If you specify a crop rectangle on line 4 – the script will only read a portion of the screen.
- If you set $speakItToMe = $false on line 5 – the script will output text, instead of speaking.
- If you change line 60 to $speak.SelectVoiceByHints(‘Male’) – you will get a male speaker instead.