Building UI Automation client applications in C++ and C#

Sample #1: https://code.msdn.microsoft.com/Windows-7-UI-Automation-9131f729

Back in March I had the opportunity to attend the 26th Annual International Technology & Persons with Disabilities Conference, otherwise known as CSUN 2011. I was to present a session on the Windows UI Automation (UIA) API. The UIA API can be used by apps to gather information about UI shown in other apps, and to programmatically control those apps. While the conference is not primarily for developers, some developers attend and Windows UIA plays an important role in accessibility. I decided to concentrate on the UIA Client API rather than the Provider API, as I expected that CSUN attendees would be interested in building end-user AT (assistive technology) apps.

So I went to MSDN and looked through the API reference for the Windows 7 UIA Client API, (https://msdn.microsoft.com/en-us/library/ee671216(v=VS.85).aspx). (By the way, I soon learnt to check for the text “(Windows)” when reviewing MSDN’s UIA search results to make sure that what I was looking at related to Windows 7 UIA and not .NET’s managed UIA API.) I made a list of the Client API interfaces that would be commonly used, and decided that my session would concentrate on those. I then had to get familiar with how those common interfaces are used. For me, the only way I can do that is by writing code that calls them. There’s a ton of useful information about UIA on MSDN, but the only way that the information will stick with me is if I can step through code in Visual Studio and see those interfaces called, and check what the results of the calls are. So I needed a sample.

I decided that my sample should involve a browser, (given that end-users spend so much time working with browsers), and the sample should both get data from, and control, the browser. I set to work first with some fundamental actions to get data from the browser, (with calls like ElementFromHandle() and get_CurrentName()), and then moved on to getting hyperlinks from  the browser and using caching to improve performance, (with calls like FindAllBuildCache() and get_CachedName()). The end result was the sample up at https://code.msdn.microsoft.com/Windows-7-UI-Automation-9131f729. This demonstrates many of the common client-side UIA API interfaces, along with lots of comments describing the API use.

The sample also makes use of another Windows accessibility-related API, the Magnification API, to highlight the area on the screen where the hyperlink lies. While this API isn’t specifically related to UIA, it was interesting to build an app that leverages both APIs. The image below shows the app with a list of hyperlinks found in the browser, along with one of the hyperlinks magnified, (and color inverted.)

 

While I was building the sample, I did find two aspects which took me longer to complete than I’d expected. The first related to the threading model. There are rules relating to how threading applies to Windows 7 UIA Client API use, and while the steps taken by your code are the usual sorts of things you’d do when building a multi-threaded app, you do have to know that they’re required. The first threading requirement is that if your app uses UIA to interact with any of its own UI, then the calls you make to UIA must be made on a background MTA thread. The other requirement is that any UIA event handler you add, must also be added on a background MTA thread, (and the call to later remove the handler must be made on the same thread on which it was added.) So in my first attempt at the sample, I had my event handler on a background MTA thread, and all other UIA calls on my main UI thread, (because my app doesn’t interact with its own UI). However, having done that I did find some unexpected delays beneath calls to UIA. So I then created another background MTA thread and moved all the calls to UIA which were previously on my main UI thread, onto that new background thread. Once I’d done that, I experienced no delays beneath UIA calls at all. So for me, these are the rules I’m living with:

1. Any calls to add or remove a UIA event handler must be done on the same background MTA thread.

2. If you make any calls to UIA to interact with your own UI, or you have an event handler, make all UIA calls from a background MTA thread and not from your main UI thread.

This meant building the sample took a little longer than expected, but it’s no big deal because many shipping apps would introduce background threads to make sure their UI is snappy at all times. Having built the sample with its UI thread and two background threads, I’ll be using it as a starting point for future samples.

The other aspect of the sample which gave me pause, related to caching. Caching is a really useful part of the Windows 7 UIA API. It’s a way to tell UIA to “get me this element, and while you’re at it, get me this information about the element”. The alternative would be to tell UIA to get an element, and once it’s returned the element to you, telling UIA to go off again and get the data about the element. Each time you tell UIA to go back to another application and get data, it requires a cross-process call, and that takes time. A high performance client app needs to reduce the number of cross-process calls that it asks UIA to make. So if you tell UIA to make only one cross-process call to get an element and some related data, that’s clearly preferable to telling it to making many cross-process calls, (that is, one to get the element and one each time it retrieves an additional piece of information).

There are a few ways you can control how caching behaves however, and this is where it took some concentration on my part to understand what to do when building the sample. You can specify such related things as the “scope”, (e.g. do you also want to cache data relating to children of the element?), the “tree filter”, (e.g. do you want to filter out elements which are unlikely to be of interest to your users?), and the “element mode”, (e.g. will the cached data be all that you need, or do you need a reference to all the elements too?). At first glance, all this might seem more than you want to have to deal with, but once you’re familiar with it, it allows you to tune your use of UIA to get exactly the data you need in the least amount of time. (In a later sample, I realized I only need cached data, so I was pleased I could use the element mode in the cache to reduce the work UIA had to do.)

 

Sample #2: https://code.msdn.microsoft.com/Windows-7-UI-Automation-0625f55e

After stepping through the first sample at my CSUN session, I got feedback that while that was interesting, do I have a C# sample? (My first sample was C++.) And my answer was, “Well… no.” So upon my return I set about building a C# sample. I decided that if my first sample really demonstrated most of the commonly used client API interfaces, then my next sample should do exactly what my first sample did, but using C#. I tried wherever possible to do a straight port to C#, leaving the existing class and file structure in place.

The use of COM interop to have my C# code call into the unmanaged UIA API was fairly straightforward. I added a build step to run tlbimp.exe on the unmanaged UIAutomationCore.dll to generate a wrapper I could call. Far more time consuming was adding all the P/Invoke code necessary to call the flat Magnification API. But once I’d done that, I was pleased with the results. I’ve not seen another C# app that uses these two Windows 7 accessibility APIs together.

I used the same approach to threading as I used in the first sample, (that is, a main UI thread which doesn’t do anything at all with UIA, and two background MTA threads for all my UIA calls). And again this added to the time it took to complete the sample, so I’ll be referring back to this whenever I need another C# sample.
The image below shows similar behavior with this C# sample as with the original C++ sample.

 

Sample #3: https://code.msdn.microsoft.com/Windows-7-UI-Automation-6390614a

As I was having so much fun building the second sample, I couldn’t resist the urge to take the element magnification beyond the browser. Given that so many end-user AT apps involve keyboard focus tracking, it seemed that with a few adjustments my second sample could be turned into a focus tracker. I already had the thread-related code in place to manage a UIA event handler, so I replaced the StructureChanged event handler in my second sample, with a FocusChanged event handler. An interesting aspect of this was that all my event handler needed to know was where the keyboard focus was on the screen. It didn’t need to interact with the element with focus in any other way, and so when it built up the cache request for the event handler it could tell UIA that it had no need for a reference to the element generating the event. This means there’s less work for UIA to do, and that’s good for performance.

Again, since I was having a great time with the sample by this point, when I set up the event handler I also requested that the name of the element with focus should be cached. By doing this I could have the name spoken when focus changes. It was interesting to move keyboard focus around and see the element with focus magnified and have the accessible name of the element spoken at the same time.

The image below shows the results generated by the sample while moving focus around the Control Panel.


While this app is certainly only a sample, it has evolved to a point where I could start considering the additional action it would need to take in order for it to become a useful standalone tool.

 

Summary

Thanks to the feedback from CSUN attendees, over the last few weeks I’ve been able to explore aspects of UIA which go beyond my original sample. The more I worked on these samples, the more excited I became about what I could do through UIA, and how I could use it in conjunction with other APIs. I started working with links in a browser, then wanted to highlight them on the screen, then wanted to highlight where focus was, and then wanted to speak the name of the element with focus. It’s got me thinking about what else I could do with all the data I can access through UIA. (Perhaps some users might find a window along the top of the screen useful if it was populated with the name of the element with focus? That might be handy if they find it a challenge to remember the purpose of certain buttons that only have icons on them.)

It’s going to be fun to consider what I can do with UIA in the future!