So how will you help people work with text? Part 2: The UIA Client


This series of posts describes how you can use UI Automation (UIA) as part of your solution to help people who find some aspect of working with text to be a challenge.

 

Introduction

A while back I had a chat with someone with a lot of experience in education, and she was telling me of the value to students of tools which allow text to be spoken. So I downloaded some of my earlier UIA client sample apps, and built a new app which could speak text shown in some apps. Details on what I did to make that happen are at A recipe for an exciting assistive technology app: Throw three UIA samples together and stir vigorously! And I made the app available at http://herbi.org/HerbiReads/HerbiReads.htm, along with a short video. (I’ve not used the app for a long time. Hopefully it still works…)

You may feel that a simple tool could help someone that you know work with text. Perhaps that’s having the text spoken, highlighted, magnified, or its definition spoken. So it’s definitely worth considering whether you could build the tool yourself, and tune it to be as useful as possible for the person you know.

And in fact, maybe you feel that the tool could be useful to you as the customer. For example, when proof-reading an important e-mail before sending it, I always want it to be read out to me. That really helps me to spot errors that I don’t spot simply by reading it. So for a while I used my own sample at Windows 7 UI Automation Client API C# sample (e-mail reader) Version 1.1 to improve the quality of e-mails I send. (Since then I’ve discovered that Outlook and Word have built-in ways of having text content spoken, so I no longer use my own tool for that.)

Below are some details on how you can use UIA to interact with text in apps. The discussion does not focus on the various ways of triggering the UIA action, (for example, through keyboard or mouse action,) or on action taken with the text once you’ve accessed it, (for example, calling into some web service to get the definition of the word).

 

Building a UIA client app

When building a UIA client app, I tend to build a WinForms C# app. I could write a C++ Win32 app, but WinForms makes so many things quick ‘n’ easy for me, that I usually go with WinForms. (I’m not familiar with building WPF apps.)

It’s interesting to note that I’m using desktop UI frameworks here rather than Windows Store app frameworks like XAML and WinJS. That’s because XAML and WinJS apps don’t have access to the Windows UIA client API.

But because I choose to build a C# app, I’ll need a managed wrapper around the native Windows UIA API.  So I use the tlbimp.exe tool to generate the wrapper for me. The tlbimp.exe tool will be somewhere in your Windows SDK folders. On my Windows 10 machine I’d run the following to generate the wrapper:

"C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\tlbimp.exe" c:\windows\system32\uiautomationcore.dll /out:Interop.UIAutomationCore.dll

 

To illustrate this, I created a new WinForm app called “SpeakWord”. I then ran the command above to generate the wrapper, and included a reference to the output wrapper in my new project.

I can then see all the interesting UIA classes available to me by using Visual Studio’s Object Browser.

Figure 1: Using the Visual Studio Object Browser to see what UIA classes are available to my app. 

 

I should say that once I’ve created this wrapper, I can use it in future projects too. I don’t have to explicitly generate it every time I start working on a new app. So while this can seem complicated, it takes no time to get going on an app once you’re familiar with the steps.

So having created my new app, I thought it might be useful if the customer could move the mouse over a word of interest, and press a key to hear that word spoken. The only time-consuming bit about implementing that was adding a global hotkey handler, as I’ve not done that from C# before. The rest was really quick for me. All I had to do was paste in a code snippet I’d uploaded to When I try to use UI Automation for PowerPoint 2013, I can only get the first character/word when I use RangeFromPoint recently, and add a few lines to call the .NET SpeechSynthesizer.

Overall this was pretty quick to do, and has real potential given that an app which helps someone know how a word is pronounced can be really useful. (I’m assuming here that the text-to-speech engine being used does a good job at pronouncing the text as expected.)

The contents of the file containing the code for the bulk of the app can be found at the end of this post.

 

So what is the app doing?

When an assistive technology (AT) app is working with an app showing text, it’s not enough just to know what that text is. The AT app may need to be able to access the text in different ways. For example, get the text beneath the mouse cursor, or at the text insertion point (ie at the caret,) or get the selected text. The UIA client app can do this through use of the UIA “Text pattern”. Details on the Text pattern can be found at IUIAutomationTextPattern, and that interface has a variety of methods useful for accessing text. (There’s also an IUIAutomationTextPattern2 interface with a couple more methods in it.)

More details around the UIA Text pattern can be found at Text and TextRange Control Patterns.

A UIA “pattern” is used to describe the programmatic behavior exposed by a UIA element. For example, a button should support the “Invoke” pattern, allowing it to be programmatically invoked. In the case of text, if a UIA element is to programmatically expose its text in the most useful way possible, then it will support the Text pattern.

However, not all document-related apps which show text support the UIA Text pattern. So if I’m interested in a particular app, I’ll point the Inspect SDK tool to it first. If the app claims to support the Text pattern, then it’ll expose an IsTextPatternAvailable property of true. A value of true on that property doesn’t necessarily mean the app will do a good job at supporting the Text pattern, but at least it’s claiming to support it.

So when building the app described at A recipe for an exciting assistive technology app: Throw three UIA samples together and stir vigorously! I first pointed Inspect to WordPad and Word 2013. In both cases, Inspect showed me that the IsTextPatternAvailable property was true. I also pointed Inspect to Word 2010 and found that the IsTextPatternAvailable property was false in that app. So my helpful AT tool just isn’t going to work with Word 2010.

Figure 2: The Inspect SDK tool showing that Word 2013 claims to support the UIA Text pattern.

 

So having learned that the provider app that I’m interested in does claim to support the UIA Text pattern, I want my app to get that Text pattern from the provider app. A pattern is accessed through the UIA element that’s implementing the pattern. So first I need to get at that UIA element.

I’m going to find the element by asking UIA to return to me the element beneath the mouse cursor. I could then ask UIA to go back to the provider app and get me the Text pattern from the element. But that would involve two cross-process calls, and I like to keep the number of cross-process calls I make to a minimum. So I’m going to ask UIA to cache a reference to the Text pattern when it gets the element.

By the way, I tend to use explicit values for pattern and property ids in my client code, pulled from UIAutomationClient.h. I don’t have to do that, and instead I could use some value accessed through the managed wrapper I generated earlier. But years ago, VS gave me some warning when I did that. I don’t remember the details there, and I’ve simply got into the habit of using the values directly.

So this is how I got the UIA element of interest:

    // We're interested in the text beneath the mouse cursor.
    Point ptCursor = Cursor.Position;

    tagPOINT pt;
    pt.x = ptCursor.X;
    pt.y = ptCursor.Y;

    int patternIdText = 10014; // UIA_TextPatternId
    IUIAutomationCacheRequest cacheRequestTextPattern =
        _uiAutomation.CreateCacheRequest();
    cacheRequestTextPattern.AddPattern(patternIdText);

    // Now get the UIA element beneath the mouse cursor.
    IUIAutomationElement element =
        _uiAutomation.ElementFromPointBuildCache(pt, cacheRequestTextPattern);

 

Having got the element, I then try to access the Text pattern from it. I didn’t bother first checking the element’s IsTextPatternAvailable property to see whether the element claims to support the Text pattern. In this simple app, I’m only interested in whether I can get a Text pattern or not.

    IUIAutomationTextPattern textPattern =
        element.GetCachedPattern(patternIdText);
    if (textPattern != null)
    {
        …

 

So there we have it. I now have a Text pattern associated with the text beneath the mouse cursor, and that works in WordPad and Word 2013, and some other important apps too.

Having got access to the text through the Text pattern, I can then have some fun working with the text. This is done through a TextRange, (or TextRange2 if you need the additional method in that). MSDN describes a TextRange as an interface that “Provides access to a span of continuous text”.  You work with a TextRange through the IUIAutomationTextRange interface, and that has all sorts of interesting members. For example:

GetText() - Get the text associated with a range.

FindText() - Find text within a range.

FindAttributes() - Find text with specific UIA text attribute within a range.

 

And there are also very helpful ways to move through the text. For example:

ExpandToEnclosingUnit() – Expand the range to include more text. Eg expand a range containing a word to contain all the text in there paragraph in which the word lies.

Move() – Move the range forward or backward in the text by some unit such as a word or line.

 

So going back to the quick app I wrote, I wanted to get the word beneath the mouse cursor. In order to do this, I needed to use the Text pattern that I got earlier, and then get the TextRange from that Text pattern where the mouse cursor is.

    IUIAutomationTextRange range = textPattern.RangeFromPoint(pt);
    if (range != null)
    {
        …

 

Now, this is where things can get interesting. While the provider app that I’m working with might provide me with a Text pattern and TextRange, that doesn’t necessarily mean the app has implemented these UIA interfaces as I expect. MSDN says that RangeFromPoint() should return a “degenerative” TextRange. A degenerative TextRange is zero-length, and I can expand it or move from it through the text in a number of ways. But someone pointed out at When I try to use UI Automation for PowerPoint 2013, I can only get the first character/word when I use RangeFromPoint, that PowerPoint 2013 doesn’t return a degenerative TextRange following the call to RangefromPoint(). Rather it returns a TextRange which includes all the text in the text box beneath the mouse cursor, and that means I can’t actually tell which word is beneath the mouse cursor. So while some apps may do what you expect when RangeFromPoint() is called, (eg PowerPoint Online, Word 2013, Outlook 2013,) others may not.

Actually, it’s worth focusing on this…

When you hit unexpected text data being returned from provider apps, it might not be obvious as to whether the provider app is really the problem, or somehow your client code is not requesting the data you intended. So before writing your UIA client code, it can be worth pointing the SDK Text Explorer tool at the provider app. That tool really isn’t the most intuitive to use, but it can help confirm that unexpected data is being returned from the provider app.

Another example of inconsistent behavior relates to WordPad and Word 2013 behaving differently in the data they return when a UIA client app calls IUIAutomationTextPattern::GetVisibleRanges(). By using the Text Explorer tool, I found that WordPad running on Windows 10 doesn't include the text that's clipped out of view, but Word 2013 does return the clipped text.

Ok, once again going back to the app, having got (what should be) a degenerative TextRange for the text beneath the mouse cursor, I can expand that to include the word of interest. I then get the text for that word, and have it spoken.

    range.ExpandToEnclosingUnit(TextUnit.TextUnit_Word);

    // Set a reasonable limit on the length of the word returned.
    wordToSpeak = range.GetText(100);

 

So with those few lines of code above, I can access the word beneath the mouse cursor.

Figure 3. The simple app speaking the word beneath the mouse cursor in MSDN documentation shown in the Edge browser.

 

A word of warning around working with UIA events

So far I’ve mentioned something of UIA properties and patterns, but another very important aspect of UIA relates to events. Events allow your UIA client app to react to things that are going on in the provider app’s UI.

For example, your app might want to be notified whenever your customer moves the caret around the provider app’s text. So you could register for the UIA_Text_TextSelectionChangedEventId, (as listed at Event Identifiers). But as helpful as event handlers are, sometimes they need to be used with care. Historically there have been constraints around what your events handlers should do. A classic constraint was not to call back into UIA from inside your UIA event handler. Some of these constraints were relaxed in Windows 8.1, and relaxed further in Windows 10. But if you hit unexpected delays in your event handler, you may be interested in reading the discussion at UI Automation events stop being received after a while monitoring an application and then restart after some time.

 

Summary

Once you’ve recognized how some tool could help someone you know work with text, consider how you can achieve your goals using the UIA Text pattern and TextRange.

If you want to access the text beneath the mouse cursor, or the text that’s currently selected, take a look at A recipe for an exciting assistive technology app: Throw three UIA samples together and stir vigorously! For more advanced interaction with text, take a look at Windows 7 UI Automation Client API C# sample (e-mail reader) Version 1.1. That mail-related sample sequentially speaks each paragraph, and uses the Magnification API to magnify the paragraph.

And given that it’s been quite a while since I’ve run that mail-related sample, I just downloaded it and built it in VS 2015. (I had to agree to building it with a more recent version of .NET, and add a reference to Microsoft.CSharp.) I then tweaked it to look at Outlook 2013 e-mail UI rather than the Windows Mail app UI that I’d originally targeted when building the sample. And of course I had to run the Inspect SDK tool when doing this in order to learn about the properties of the e-mail UI that I wanted to get the text from. Having done that, my sample app sequential spoke and highlighted the paragraphs in the e-mail.

Figure 4: A UIA client app accessing text paragraphs shown in an e-mail composition window.

 

So a polished-up version of this mail-reading app could be a valuable tool for many people, (and like I said, I used it myself for a while). It was also a ton of fun to build! 🙂

I hope you find building these sorts of apps as rewarding as I do.

Guy

 

Posts in this series:

So how will you help people work with text? Part 1: Introduction

So how will you help people work with text? Part 2: The UIA Client

So how will you help people work with text? Part 3: The UIA Provider

 

P.S. Here’s the code of interest for the simple app that I built to speak the text beneath the mouse cursor.

using System;
using System.Drawing;
using System.ComponentModel;
using System.Runtime.InteropServices;
using System.Speech.Synthesis;
using System.Windows.Forms;
using Interop.UIAutomationCore;

namespace SpeakWord
{
    public partial class FormSpeakWord : Form
    {
        private IUIAutomation3 _uiAutomation;

        private SpeechSynthesizer _speechSynthesizer;

        private IntPtr hotkeyIdSpeakWordBeneathMouseCursor = (IntPtr)1001;

        public FormSpeakWord()
        {
            InitializeComponent();
        }

        private void buttonClose_Click(object sender, EventArgs e)
        {
            this.Close();
        }

        private void FormSpeakWord_Load(object sender, EventArgs e)
        {
            // Get an IUIAutomation3 interface for all interaction with UIA.
            _uiAutomation = (IUIAutomation3)new CUIAutomation8();

            // Get a SpeechSynthesizer in order to speak the word accessed through UIA.
            _speechSynthesizer = new SpeechSynthesizer();

            // Get notified when the F8 key is pressed.
            Win32.RegisterHotKey(this.Handle, (int)hotkeyIdSpeakWordBeneathMouseCursor, 0, 0x77 /* VK_F8 */);
        }

        protected override void OnClosing(CancelEventArgs e)
        {
            Win32.UnregisterHotKey(this.Handle, (int)hotkeyIdSpeakWordBeneathMouseCursor);

            base.OnClosing(e);
        }

        protected override void WndProc(ref Message m)
        {
            base.WndProc(ref m);

            if (m.Msg == 0x0312) // WM_HOTKEY
            {
                if (m.WParam == hotkeyIdSpeakWordBeneathMouseCursor)
                {
                    // Our hotkey's been pressed!
                    SpeakWordBeneathMouseCursor();
                }
            }
        }

        private void SpeakWordBeneathMouseCursor()
        {
            string wordToSpeak = GetWord(); // Get the word beneath mouse cursor.
            if (wordToSpeak != "")
            {
                _speechSynthesizer.SpeakAsync(wordToSpeak);
            }
        }

        private string GetWord()
        {
            string wordToSpeak = "";

            // We're interested in the text beneath the mouse cursor.
            Point ptCursor = Cursor.Position;

            tagPOINT pt;
            pt.x = ptCursor.X;
            pt.y = ptCursor.Y;

            int patternIdText = 10014; // UIA_TextPatternId
            IUIAutomationCacheRequest cacheRequestTextPattern =
                _uiAutomation.CreateCacheRequest();
            cacheRequestTextPattern.AddPattern(patternIdText);

            // Now get the UIA element beneath the mouse cursor.
            IUIAutomationElement element =
                _uiAutomation.ElementFromPointBuildCache(pt, cacheRequestTextPattern);

            // Does the element support the Text pattern?
            IUIAutomationTextPattern textPattern =
                element.GetCachedPattern(patternIdText);
            if (textPattern != null)
            {
                // Now get the degenerative TextRange where the mouse is.
                IUIAutomationTextRange range = textPattern.RangeFromPoint(pt);
                if (range != null)
                {
                    // Expand the TextRange to include the word around it.
                    range.ExpandToEnclosingUnit(TextUnit.TextUnit_Word);

                    // Set a reasonable limit for speaking the word returned.
                    wordToSpeak = range.GetText(100);
                }
            }

            return wordToSpeak;
        }
    }

    public class Win32
    {
        [DllImport("user32.dll")]
        public static extern bool RegisterHotKey(IntPtr hWnd, int id, uint fsModifiers, uint vk);

        [DllImport("user32.dll")]
        public static extern bool UnregisterHotKey(IntPtr hWnd, int id);
    }
}

 


Comments (15)

  1. tim says:

    Guy, this is great. I find though that I cannot get Windows UI Automation (not the .NET one either) to get text from Edge. Does it use a different pattern? When I list patterns using .NET I get no supported patterns.

  2. tim says:

    Is there something I need to turn on to get Edge to provide ui Automation?

  3. tim says:

    caret browsing, I turned on caret browsing and now my Win UIA (but still note .NET UIA) seems to pull text  from Edge

  4. tim says:

    Now it works, even with caret off, maybe I was just imagining things. I do find I have to use the Windows UIA and not .NET UIA. Now the next thing that doesn't seem to work is

    CompareEndpoints

    I use

                           range.ExpandToEnclosingUnit(TextUnit.TextUnit_Character);    // for chinese

                           text += "character: " + range.GetText(-1).Trim() + Environment.NewLine;

                           var charRange = range.Clone(); ;

                           range.ExpandToEnclosingUnit(TextUnit.TextUnit_Word);    // for chinese

                           text += "word: " + range.GetText(-1).Trim() + Environment.NewLine;

                           var wordRange = range.Clone();

                           var rects = wordRange.GetBoundingRectangles();

                           int charStartPoint = wordRange.CompareEndpoints(TextPatternRangeEndpoint.TextPatternRangeEndpoint_Start, charRange, TextPatternRangeEndpoint.TextPatternRangeEndpoint_Start);

                           int charEndPoint = wordRange.CompareEndpoints(TextPatternRangeEndpoint.TextPatternRangeEndpoint_End, charRange, TextPatternRangeEndpoint.TextPatternRangeEndpoint_End);

                           text += charStartPoint + ", " + charEndPoint + Environment.NewLine;

    and this works great in .NET UIA but not in Windows UIA, the start and end points are always -1 -1.

  5. Hi Tim,

    Are you finding you get the unexpected -1 results for all UI where you run this code, or only with specific UI? I'd like to try to reproduce this myself using the Windows UIA API, so that I can investigate further. If you could give me an example of specific UI where the unexpected values are returned, that would help me.

    Thanks,

    Guy

  6. tim says:

    Dear Guy,

    I realized what was happening. I can get text from plain old paragraphs, but try going to a table or even to google news. I cannot get the TextPattern to work. Perhaps I need to drill down more? Any advise would be helpful.

    Some issues this morning:

    1. Try to get text from the article snippet in news.google.com – I get no textPattern

    2. Try to get text from the article headline, it retrieves the text into "Name" (because it is a link? can I get a RangeFromPoint from it?)

    3. Try to get text from the left most column in msdn.microsoft.com/…/gg701984(v=vs.85).aspx

    4. Why is class name missing for Edge?

    5. Is it possible to get AriaRoles?

    6. Why can I not use the predifined IDs? I can look up UIA_PropertyIds.UIA_ClassNamePropertyId in VS, but when I try to compile it says Interop type cannot be embedded, use applicable interface instead…

    I do find that the Windows UIA is an improvement over .NET UIA in terms of accessibility though.

    Are you in Redmond, would love to buy you lunch. I am not far in Bellingham.

    Thanks,

    Tim

  7. tim says:

    BTW, I resolved the -1 issues. It seems that Edge just returns whether you are behind or in front. I was able to move a clone of my target range around until it matched. This is probably because this is html based and exact numbers may not be computable ahead of time.

  8. tim says:

    I think I need to dig down with something like this, though I'm not sure yet how to get my cached request to have children.

                       var children = element.GetCachedChildren();

                       if (children == null) return text;

                       for (int i = 0; i < children.Length; i++)

                       {

                           textPattern = children.GetElement(i).GetCurrentPattern(patternIdText);

                           if (textPattern != null)

                           {

                               wordFromRange(sw, pt, text, textPattern);

                           }

                       }

  9. tim says:

    Guy I feel nervous about switching from .NET UIA to native – is there no .NET UIA switch to make it equivalent to native? I'm trying now to get the cached children, so I think I need to set the TreeScope in the cache request. When I do

    cacheReq.TreeScope = TreeScope.TreeScope_Children;      // causes E_FAIL

    cacheReq.TreeScope = TreeScope.TreeScope_Descendants;      // causes E_FAIL

    I get HRESULT E_FAIL.

  10. tim says:

    Sorry for all the messages here. I figured it out, I did not read the instructions. I needed cacheReq.TreeScope = TreeScope.TreeScope_Children | TreeScope.TreeScope_Element;

  11. tim says:

    I'm no longer getting E_FAIL but items such as Hyperlinks in Edge do not seem to have children that I can parse. 🙁

  12. tim says:

    I want to use msdn.microsoft.com/…/ee671665(v=vs.85).aspx – but hyperlinks, titles, table cells, they all seem to have no children when I GetCachedChildren even with the TreeScope in the cache request.

  13. tim says:

    Is it possible to get RangeFromPoint if there is no text pattern?

  14. RangeFromPoint() is only available to UIA clients if the provider supports the Text pattern. (Most of what the Text pattern does is provide a variety of ways for a client to access TextRanges.)

Skip to main content