Add-on Performance Part 3: Optimizing Add-on Startup Performance

In the first post of this series, we described how add-ons can decrease IE’s performance during tab creation. Many users with add-ons enabled have noticed a performance improvement when they open new tabs after disabling their add-ons. We also walked you through how to measure add-on performance and identify areas of impact using the Windows Performance Tools.

In this post, we delve into methods and best practices developers can use to improve their add-ons’ start-up performance, first time and every time. A great place to start is by minimizing the number of DLLs and dependencies associated with an add-on and by delay-loading these DLLs where possible. We show the bare minimum amount of code an add-on needs to run during initialization to ensure the fastest performance, and show how using the SetSite() implementation in the example below, we built a toolbar that has only a 0.01 second average load time in IE. We’ll also show you how to investigate add-on performance problems using the Windows Performance Tools, and will use the data to highlight the various performance footprints that are the results of several common add-on coding mistakes today.

Once again, we ask add-on developers to use the information from this post and start applying performance optimizations today. Improving your add-on’s load time also differentiates your add-ons from other slower add-ons when compared to each other via the Manage Add-ons window.

Revisiting Add-on Tab Creation

To bring readers up to speed, let’s first review how add-ons affect IE during tab creation. Every time the user creates a new tab, IE initializes add-ons to make sure that they can interact with a webpage properly. In fact, when the user launches a new IE window one of the first things IE does is create a new tab that, in turn, initializes all of the user’s add-ons in sequence. The time it takes each add-on to initialize is its load time.

When IE initializes an add-on, it first calls the CoCreateInstance() function on the add-on’s ClsID, which in turn invokes the add-on module’s DllGetClassObject() function to create an object in memory. Add-ons do not typically incur a performance delay during this function call. It’s nevertheless important to focus on this function call while optimizing startup performance since we’ve found examples of slow performance in some of the most popular add-ons.

Once the add-on object has been instantiated, IE passes a pointer to the IWebBrowser2 object to the add-on via the SetSite() function. This function facilitates the add-on’s initial communications with Internet Explorer and is exposed by the IObjectWithSite interface, which all IE add-ons must implement. Add-ons typically run their initialization routines in this function, such as displaying toolbar UI or loading other modules.

If the add-on being initialized is a Toolbar or Explorer Bar, IE calls the add-on’s ShowDW() function to make the add-on visible on the browser window. Some add-ons choose to run their UI rendering code within this function so it can also impact startup performance.

A Simple SetSite Implementation

The majority of performance issues during add-on startup occurs within SetSite(). We encourage developers to spend as little time as possible doing work in this function. This is the best way to optimize for a fast load time. The following code describes a sample ATL implementation of a toolbar’s SetSite() function. There are several key points worth noting:

  1. This implementation is written with IE-native technologies (C++, Win32). We recommend building IE add-ons using these technologies instead of platforms that have a high up-front initialization or working set cost, such as Silverlight and managed code
  2. This implementation performs the minimum amount of work required to complete the initialization, ensuring that the toolbar returns from the SetSite() call as soon as possible. Add-on developers should minimize the number of DLLs and external dependencies associated with an add-on and delay-load DLLs where possible
  3. The toolbar won’t perform any network operations or disk-intensive operations (such as registry accesses) commonly observed in the SetSite() calls of certain add-ons
  4. When you close a tab or the IE window, IE calls SetSite(NULL) on the add-on. Minimize the work in this part of the function call as well to ensure that tabs close quickly
  5. Some add-ons need to handle DWebBrowser2 events (such as DocumentComplete) and run code. If your add-on doesn’t need to handle any events, then you don’t need to sink them in SetSite()
 // IObjectWithSite --------------------------------------------------------
STDMETHODIMP CContosoBand::SetSite(IUnknown *punkSite)
{
   if (punkSite != NULL)
    {
       // Initialize the toolbar.
       CComPtr<IOleWindow> spOleWindow;
       HRESULT hr = punkSite->QueryInterface(IID_PPV_ARGS(&spOleWindow));
       if (SUCCEEDED(hr))
       {
          hr = spOleWindow->GetWindow(&_hwndParent);
          if (SUCCEEDED(hr))
          {
             // Create a standard Windows comctl32 toolbar control and add a few buttons to it 
              hr = _toolbar.Init(_hwndParent);
          }
       }
 
       // Store off IWebBrowser2 and sink DWebBrowserEvents2
       if (SUCCEEDED(hr))
       {
          CComPtr<IServiceProvider> spServiceProvider;
          hr = punkSite->QueryInterface(IID_PPV_ARGS(&spServiceProvider));
          if (SUCCEEDED(hr))
          {
              hr = spServiceProvider->QueryService(SID_SWebBrowserApp, IID_PPV_ARGS(&_spBrowser));
              if (SUCCEEDED(hr))
              {
                 //Establish a connection to the DWebBrowser2 event source
                 hr = DispEventAdvise(_spBrowser);
                 if (SUCCEEDED(hr))
                 {
                     _fAdvised = true;
                 }
              }
          }
       }
    }
    else
    {        
       //Terminate the connection to browser event source
       if (_fAdvised && _spBrowser != NULL)
       {
          DispEventUnadvise(_spBrowser);
          _fAdvised = false;
       }
    
       // Tear down the toolbar window and clear out the pointers
       _toolbar.Destroy();
 
       _spBrowser.Release();
    }
 
    return IObjectWithSiteImpl<CContosoBand>::SetSite(punkSite);    
}

A toolbar that we built using the above SetSite() implementation only incurred a 0.01 second average load time in IE. If you mirror your add-on’s implementation as described above, you will be able to minimize your add-on’s impact to IE’s tab creation performance.

Investigating Add-on Performance using Windows Performance Tools

As you develop more functionality to your add-on you will need to perform additional operations during add-on initialization. We encourage you to continually monitor your add-on’s performance using the Windows Performance Tools (xperf) or via IE’s Manage Add-ons window. If you happen to find performance regressions, it’s important not to rely on guesswork to find the root cause. In fact, you can use the Windows Performance Tools to profile your add-on’s performance, investigate the traces and identify the regressions.

In the following examples we’ll show you how to profile the startup performance for your add-on. The first step involves collecting a performance trace of the add-on during tab creation.

  1. Make sure your add-on is enabled (and visible if your add-on is a toolbar). For the best results, disable the other add-ons installed in the browser.
  2. From an elevated command prompt, execute the following command (assuming you’ve added the xperf folder to your system path) to start the trace:
    xperf -on latency+dispatcher -stackwalk profile+cswitch+readythread -buffersize 128 -minbuffers 300 –maxbuffers 300 -start browse -on Microsoft-IEFRAME:0x100
  3. Launch IE and open several new tabs. Be sure to wait a sufficient time between actions to ensure that all the necessary code executes successfully. You can open more new tabs to get more datapoints out of the trace.
  4. Stop the trace and log the results to a file (such as toolbarprofile.etl):
    xperf -stop browse -stop -d toolbarprofile.etl
  5. Once you’ve collected a trace, you can use the Windows Performance Analyzer (xperfview) to display several graphs showing a timeline of data covering the period recorded by the trace:
    xperfview toolbarprofile.etl

Main page for Windows Performance Tools

The Windows Performance Analyzer contains many different graphs based on the data from your trace. You can strip it down to only the relevant graphs for more convenient analysis.

Let’s zoom into the events that correspond to the add-on’s initialization (and un-initialization). We recommend developers analyze add-on performance for all of the following events:

Scenario

Start Event

End event

Tab Creation

(Load time)

CoCreateInstance(): Microsoft-IEFRAME/ExtensionCreate/Start

SetSite(): Microsoft-IEFRAME/ExtensionSetSite/Start

(Toolbars/Explorer bars only) ShowDW(); Microsoft-IEFRAME/ExtensionShowDW/Start

CoCreateInstance(): Microsoft-IEFRAME/ExtensionCreate/Stop

SetSite(): Microsoft-IEFRAME/ExtensionSetSite/Stop (Toolbars/Explorer bars only) ShowDW(); Microsoft-IEFRAME/ExtensionShowDW/Stop

Tab Close

SetSite(NULL): Microsoft-IEFRAME/ExtensionSetSiteNull/Start

SetSite(NULL): Microsoft-IEFRAME/ExtensionSetSiteNull/Stop

For this example we’ll focus on analyzing The events that account for the load time (ExtensionCreate, ExtensionSetSite, ExtensionShowDW) of our toolbar.

Right click the GenericEvents graph and select Summary Table. You’ll see a window which allows you to identify when and how long your events fired in the trace. Use the selector on the left to customize your columns in the following order for easier data consumption. If you find this column sequence helpful you can create a summary table profile for it via the Save View… option from the View Menu:

Process Name, Provider Name, Field1, Yellow Bar, Count, Time (s) (sort entries by this column), Task Name, Opcode Name

Expand the field for the iexplore.exe process, then the Microsoft-IEFRAME provider,and finally the field corresponding to the ClsID for your add-on. . Scroll down until you find the win:Start and win:Stop event pairs for each of the initialization events. The overall time delta from ExtensionCreate-Start to ExtensionShowDW-Stop is the add-on’s load time, which in this case is only 0.01 seconds:

Summary table view from Windows Performance Tools

You can select the proper time region in the main window that is relevant to your add-on’s load time. Go to the main window, right-click on any graph and choose “Select Interval…”. Enter the values from the ExtensionCreate/Start and ExtensionShowDW/Stop events in the Summary Table (hint: right click on the cell containing the time for each event and select “Copy Cell”).

Once you hit enter on the Select Interval window, the specified time range will be selected in the main window. You can replicate that selection to all of the graphs by right-clicking on the selection and clicking “Clone Selection”. You can then zoom into that range for all graphs.

With the time interval set up, you can start looking at the individual views to investigate performance. For example, you can look at the CPU Usage by Process graph to see how much CPU time iexplore.exe was using during your add-on’s startup. You may need to use the selector on the left to enable the views. Here’s how the CPU Usage by Process graph looks like for the toolbar that we just profiled. Note that a value of 25% usage corresponds to using 100% of one CPU in a four-CPU system:

Graph that shows the CPU usage for the iexplore.exe processes when a typical add-on is initializing.

CPU Usage by Process (iexplore.exe processes only)

Graphical View of Common Performance Problems

For comparison, we’ve captured traces for two add-ons that have known performance problems. You can see the differences in the graphs and the length of the initializations.

Graph showing the CPU usage if an add-on is waiting for a network call to return.

Sample Add-on 1 – Network calls during initialization (iexplore.exe CPU usage only)

This add-on takes around 0.27 seconds to load, but most of the time is spent waiting for a network call to respond, as evidenced by lack of CPU activity throughout the initialization.

Graph showing the CPU usage for an add-on performing lots of registry operations during initialization

Sample Add-on 2 – Registry operations during initialization (iexplore.exe CPU usage only)

This add-on takes around 0.3 seconds to load. IE consistently uses 25% of the CPU throughout the initialization accessing and writing values to the registry.

Summary

Tab creation performance is a great measure of users’ overall browsing satisfaction. As add-ons are commonly installed by our users today, improving add-on startup performance is critical to ensuring that our users can continue to enjoy the benefits of add-ons while staying fast.

We encourage add-on developers to take the guidance from this post and apply it to their relevant add-ons. Design the initialization routines for your add-ons from the ground up with performance in mind. Start by minimizing the number of DLLs that need to be loaded by an add-on and delay-loading these DLLs where possible. Use the Windows Performance Analyzer to profile your add-on’s performance and spot different issues to optimize for, instead of attempting to stipulate the problem areas in code. From measurement to investigation to optimization, add-on performance engineering is a repeated process.

Developers can use these tools and optimizations for other scenarios as well, such as navigation, add-on un-initialization, etc. If you find useful optimization methods or have any questions about analysis steps, feel free to post comments here. Start optimizing add-on performance today!

Herman Ng
Program Manager

Useful links

Previous Blog Posts on Add-on Performance