UITest framework – IE Plugin – Part 1

Introduction to IE Plugin

The plugin for Internet explorer makes use of mshtml for accessing the DOM constructed by the IE engine.

The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The HTML DOM defines the objects and properties of all HTML elements, and the methods (interface) to access them.


Configs Supported

The record & playback feature for web applications is supported only on IE 7.0 & above. Applications written in ASP.Net, PHP, JSP, ASP, Ajax, SharePoint and Web 2.0 applications are supported. Record & Playback is not supported on ActiveX controls and IE 64 bit applications.


Recording on Web application controls


Whenever the user performs any action on a UI-object the recorder saves the UI-Object’s properties in a separate segment called the UIMap. For each UI-Object the recorder saves two kinds of properties, primary properties and secondary properties.

Primary Property and Secondary Property:

Whenever we act on a control the recorder tracks the UI-object at the screen coordinate where the user did some mouse/keyboard actions. The recorder locates the UI-object from the coordinate and then gets its properties.

Sample UIObject Node recorded shows the information that was captured for a button.



For every control we want to record on, we capture the primary properties and secondary properties of that control. The above UIObject node highlights the primary and secondary/filter properties that are being captured for the button. Properties like ControlType, id , name, TagName comprise the primary set of properties. These properties are the basic characteristic of the control. Apart from these we capture another set of properties called the secondary properties or filter properties. Properties like title, controldefinition (outerhtml – innerhtml) , taginstance, title etc comprise the secondary set of properties. These properties help us in filtering the control when multiple controls have the same set of primary properties. Further details on how we exactly use these two sets of properties are explained in the Search section.


The Query Id of a control is a string indicating the hierarchy of controls. Whenever we act on a control the recorder tracks the UI-object at the screen coordinate where the user did some mouse/keyboard actions. The recorder locates the UI-object from the coordinate and then gets its properties. It then traverses the UI hierarchy to gather information regarding the parents of that object and stores them. For example: The query id for a click on a button inside an IE Frame is recorded as TopLevelIEWindow -> FrameWindow -> PageDocument -> Button.

This hierarchy captured during recording will be used during playback for finding the control. We do not capture all the elements in the hierarchy of a UIObject. If the user acted on an object which has characteristic property like name/id which uniquely identifies the control, then we do not record the intermediate controls in the hierarchy of the control. We directly record the object as present under the page document. If the properties of the control on which the user acted do not have any characteristic property, we record all the properties of it, and then traverse to its parent node and obtain its properties. If the parent node has a rich set of properties, we store it in the query id. In case the parent node also doesn’t have any characteristic property we further traverse to its parents in search of good intermediate controls. This mechanism is employed in the generation of the query id which aids in the search for the control.


Aggregator is an important component of a Recorder. Aggregator is a rule processing system which waits on the Raw Action List and each time there is any new action, it reads it. Then it goes through the rules to either combine it (aggregate) with previous events or drop it. The aggregators run in a separate low priority thread. The aggregation happens in this thread on the fly as the recording is going on. For example some of the aggregation rules are:

1. Combine Mouse-down and mouse-up event on a button to a mouse click.

2. Combine Click on combobox dropdown button and click on an item in the expanded list to a setvalue action on the combobox

3. Combine Click on fileupload button and choosing of a file to a setvalue action on the fileupload control

4. SendKeys of ‘a’, ‘b’, ‘c’ in textbox is aggregated to a setvalue of ‘abc’ in the textbox

NavigateToUrl Action:

This is an aggregated action which is recorded whenever a user launches a new browser instance or when a user navigates to a different page using an existing browser. A user may launch IE in any of the following ways (Desktop launch, quick launch, start menu launch). All the intermediate actions involved in launching IE are aggregated to a navigateToUrl action. In case of navigation in an existing browser, click on address bar + typing of Url + sendkeys of {Enter}/ selection of url in the dropdown list are aggregated to a navigateToUrl action.

The following table shows how the different user actions on different control types get aggregated.


Control Name

Possible User Actions

Aggregated action

Textbox, TextArea

Click on control + Type abc in the control



Click on combobox (combobox expanded) + Click on list item (or) Setfocus on combobox + typing characters to select a list item


Checkbox, RadioButton

Selection/ Deselection through mouse click



Click on listitem in a listbox (or) Selection of listitem through up/down arrow


MultiSelect Listbox

Click on listitem + Pressing cntrl + selecting other listitems (or) Click on listitem + doing a drag action to select multiple listitems

SetValue(listitem1,listitem2,listitem3, …)


Click on fileupload control/browser button + selection of a file in the file upload dialog


Back , Forward, Refresh and Stop buttons on IE Toolbar

Click on the mentioned buttons






Hover is an action type which is specific to Web applications. A Control changing its property or behaviour on mouse moving into the control is pretty common in the world of web applications. Few common examples are,

1. Moving the mouse over an image causing a sparking effect (done through changing the src of the image)

2. Moving the menu over a control present as the top level menu item resulting in a sub menu to drop (changing the style. visibility of the submenu)

3. Changing the css style either by updating the “id” attribute or “class” attributes.

Implicit Hover:

The recorder tracks these mouse hovers over an IE document. But since implicitly recorded hovers may decrease resilience (hover might be recorded on controls whose properties can change with time), the attribute “ContinueOnError” is set to true for all implicitly recorded hovers. This ensures that during playback even when an implicit hover fails the playback still continues with the next action.

Explicit Hover:

Incase if the hover action has to be played back without fail, the user can record an explicit hover by moving the mouse on top of the control and pressing the hover key combination ( Cntrl + Shift + HoverKey). HoverKey is the character ‘V’ by default. If the user doesn’t want hovers to be recorded implicitly he can choose to turn this option off through the mtm.exe.config or CUITBuilder.exe.config file. If the user wants to change the hot key for explicit hover he/she can do so by changing HoverKeyModifier and HoverKey attributes.



Playback (or automation) very simply put is all about finding a UI-object based on certain properties (say button named “OK” in a dialog named “Confirmation”) and performing some action on it (say clicking the OK button). Hence there are two parts to this problem “Search” and “Action”.

A typical playback of an action on IE control would go through the following steps. We will look at each of the individual steps in the following sections.




Search for Top level IE Window :

Playback has to find the top level IE window before it can search for controls within the IE document. This search for the IE top level window is done using MSAA plugin. This search happens in 3 passes.

Pass 1: Search only in all visible windows inside desktop (no timeout) with SmartMatchOption turned off

Pass 2: Search in all visible + minimized windows with a timeout of 15% of total search time out value (e.g, 2 min by default). In this pass also the SmartMatch option is set to false

Pass 3: Both options minimized windows and SmartMatch are enabled in this and with a timeout value of (SearchTimeout – 15%ofSearchTimeout)

Search for IE Controls :

First requirement for search is to know what we are looking for. Playback uses QueryID captured during recording for this. QueryID specify the UI-Object hierarchy and UIMap specifies the object properties.

Search in playback is a breadth first search. The root of tree is Desktop if no root is mentioned. Once root is defined the search is BFS in the Root elements sub-tree (tree of root elements framework). We first search for the top level IE window using MSAA plugin. To find the next element we look in the MSAA tree and navigate its sub-tree. Playback switches to using IE Plugin to find controls within IE document. A normal search would be finding each query element on the way from first to last. But this is not very resilient with frequent UI changes. Hence in playback if we do not find an intermediate node we ignore intermediate query elements. Hence for successful search only requirement is finding first and last query element.

During parsing of QueryID, few of the primary properties (tag name, id, name) are identified and used as filters on the DOM element collection. Secondary properties in the query Identifier are used to identify the control when more than one control matches the primary properties. Disambiguating based on secondary properties is based on the order of the properties. For a secondary property if any of the control matches the property, then the controls that don’t match are removed from the pool of controls. If none of the controls match, the search moves onto the next secondary property with the existing pool of controls unchanged. This is repeated till only one control remains or the all the secondary properties are exhausted. When more than one control remains after all the properties have been used, the first control is chosen.

In case of UI changing dynamically at runtime, there’s no event in mshtml that notifies such changes. So polling the mshtml DOM is the only way to check if a control is available due to some code execution (could be based on AJAX response or pure client side javascript). To workaround this, IE plugin has a retry mechanism where we repeat the search for 3 times until the search passes.

Order of Invocation :

In case we record on multiple IE windows which has the same title, the playback needs to know which IE window it has to interact with. To solve this issue, we record another property called Order of invocation during recording which indexes the similar IE windows based on the order in which we perform the recording on it. Order of invocation is a filter property for identifying the top level IE window.

Smart Match

Some of the search properties may change over time. Record and Playback engine uses a Smart Match algorithm to identify controls if they cannot be located using the exact properties. The smart match algorithm uses heuristics and attempts to locate the control using variations on the search properties.

By Default smart match is enabled for finding top level windows and controls within a top level window. There might be cases where we can find wrong control due to smart match. You can control when Smart Match should be applied. You can turn off Smart match using the following code snippet.

Playback.PlaybackSettings.SmartMatchOptions = SmartMatchOptions.None;

SmartMatch at the level of controls is used in the following way:

Pass 1: Id of the control is ignored while performing search for the control

Pass 2: Name of the control is ignored while performing search for the control.

Pass 3: Both id & name of the control are ignored while performing search for the control.

If the playback can find a unique control by ignoring the above mentioned properties, the search succeeds, else it reports a search failure.

Ensure Visible

After a control has been found it’s not always simple to interact with it. The control may be out of screen or may be out of focus. Actions on that control may fail if above is not ensured. Hence before performing any of the action playback ensures that control is in focus and it’s visible on screen. For IE controls, the IE plugin does a scroll using the mshtml API to bring the control to focus.

Wait For Ready

The Wait For Ready implementation at the IE Plugin is to ensure that the controls are being searched only when the document is in a state where user actions are possible. To ensure this the following parameters are tracked,

1. Document to be interacted with should be in a ready state complete.

2. Page Navigations are not in progress.

3. No AJAX requests are in progress. The WFR code injects JavaScript code into the html document to track Ajax requests.

4. Page is not refreshing.

5. No pending timer calls to be executed. WFR will wait for a maximum of 10 seconds for pending timer calls to get completed.

UI Synchronization

UI synchronization is to ensure that playback action went thru as expected. Many times a key input may not reach a UI (changing focus) in such cases we can retry. This is done by hooking to the process and watching the “Message Queue” for the UI object. Some of the instances where we disable UI Synchronization are for actions on disabled controls, Sendkeys of modifier keys (alt, cntrl etc) to a control.



In this part 1 of the blog we have seen how IE plugin records and playback actions on IE application. In Part 2 of the blog we will the known issues and how the user can troubleshoot issues in record and Playback on IE applications.


Author – Praveen R

SDET – CodedUITest

Comments (1)

  1. cheater says:


    I have some problems with some tests… I have to test a web application developed in c#. My biggest problem is handling menu’s id codes. here is an example:

    Menu1 (menu_id_1)

    Submenu1.1 (menu_id_2)

    submenu1.2 (menu_id_3)

    Menu2 (menu_id_4)

    Submenu2.1 (menu_id_5)

    submenu2.2 (menu_id_6)

    In parantheses are the menu’s ids. But the application can be configurated so that menu1 and all it’s submenus will not appear and in this case the menu looks like:

    Menu2 (menu_id_1)

    Submenu2.1 (menu_id_2)

    Submenu2.2 (menu_id_3)

    I want to write some code in UIMap class, but I don’t have any ideea… there are very many classes… In UIMap class I want to build the menu code and assign it to a HtmlHyperlink variable

    and check if it exists, but everytime the result is that the item  does not exists. Can you help me to handle this situation? thanks