Posted by: Sue Loh
If you read our blog regularly then you already know that CeLog is a data logging tool, which provides a lot of interesting system data because we’ve made the kernel log CeLog events for occurrences like thread switches. You should also know that the kernel “Monte Carlo” profiler operates by periodically taking an interrupt and recording the program counter of the current running thread. Well, several years ago (CE 4.2 days) we had an idea — if we saved all the Monte Carlo profiler hits in the CeLog data stream, we could post-process the CeLog data and break down the profiler hits per thread. Because we can use the thread switch events to know which thread was running at the time the profiler hits were recorded. So, I went and changed the profiler to have a “CELOG” mode where it records the program counter to CeLog instead of buffering it. And I built a desktop-side tool to post-process the CeLog log file and total up all of the profiling hits. Thus “perfalyzer” was born. (The name was an imitation of an earlier CeLog helper tool, “memalyzer,” which totaled up CeLog events for heap usage to catch heap leaks.)
An example (http://www.thelohs.net/sue_blog/overall.htm) can probably explain it better. Note that the first two displays on the main page (overall.htm) are the same as the original Monte Carlo profiler output. The third display is a breakdown of hits per thread. Clicking on a thread will take you to a different page (thread.htm) where that thread’s hits are broken down per symbol, showing where that particular thread spent its time. Clicking on the symbol report will take you to a similar report (symbol.htm), where each symbol’s hits are broken down per thread, showing which threads were calling that symbol most.
So the initial purpose of perfalyzer was to total up a per-thread view of our traditional Monte Carlo data. Which was nice, but hasn’t turned out to be all that pivotal in most of our performance investigations. Instead, perfalyzer took on a whole new role that was different from its initial intention. Because perfalyzer has an advantage over our traditional Monte Carlo profiler: as a desktop-side tool, it looks up the symbols for all of the profiling hits using the release directory contents instead of the device-side symbols that Monte Carlo uses. You see, when you enable profiling in your image (set IMGPROFILER=1), you include the Monte Carlo profiler into the kernel AND include all the symbols for all the executables in the “MODULES” section of ROM in a big table inside the image. The profiler uses that big symbol table to convert addresses like “0x01234567” to symbols like “foo.dll!DoSomethingSlow()”. That is why the profiler reports “unknown” for any hits that are in modules that are not in the MODULES section of ROM. Because those symbols are not in the big table.
The reason perfalyzer can look up symbols is that among the kernel events we log to CeLog, we log module loads (DLL loads) and process creation (EXE loads). That is the information that’s required to know which DLL/EXE a profiling hit occurs inside. Once you know the start address of the executable, you can figure out what offset inside the DLL/EXE the hit was at, and use the .map files from the release directory to look up symbols. Exactly the way I describe in my old blog post, http://blogs.msdn.com/sloh/archive/2005/02/28/381706.aspx.
Why It’s Interesting
Perfalyzer has the advantage that it can look up symbols in ANY executable, not just one from the MODULES section of ROM. Further, it allows you to run the profiler without the presence of the big symbol table in ROM. And for those reasons, it has evolved into our solution for a lot of profiling cases:
- Windows Mobile 2003 had a restriction that the kernel XIP region could not exceed a limit, something like 2MB. Well, the symbol table easily exceeds 2MB for all the code that’s in ROM. So setting IMGPROFILER=1 would cause image build (makeimg) failures. The only way to use the profiler was in combination with CeLog.
- Windows Mobile 2005 introduced a new technology, Image Update, that really is very important. But with all the changes to how images are built, it became unsupported to add the profiler symbol table to ROM. So yet again, it was impossible to use the profiler except with CeLog.
- The Monte Carlo profiler only works over a KITL connection. Sometimes you want to profile on a stand-alone device that’s not connected to Platform Builder. In that case you’d want to use the profiler with CeLog and a flush tool like CeLogFlush.exe or the WM5 PerfMan helper tool.
Now here’s the problem. I made perfalyzer as a proof of concept, as an internal tool. It has never gotten any testing, and furthermore, our test team just hasn’t had the bandwidth to pick up testing for it. It has had plenty of bugs and I’m sure there are more. So it isn’t in a state where we can really ship and support it. That means, for anyone using our last couple of versions of Windows Mobile, or using Image Update, there is no official supported way to do Monte Carlo profiling. 🙁
How to Use It — Unofficially
If you are using Platform Builder, you will find that perfalyzer snuck into the tool set that you get. It should be present at
Now as I said, it is officially unsupported. Hopefully we can change that someday soon, or replace it with something better, but for now you are on your own. It might have security vulnerabilities. It might crash your hard drive. It might attack your dog or turn you into a conservative voter. If you find bugs, well, I might have the time to help fix them, but I cannot promise anything. You might find the tool bad enough to be completely unusable. I make no promises. “Most” of the time it works “well enough.”
Now that that’s over with. Whew! Here are basic steps to use it. These instructions assume that you are building from a CE OS build window, and that you’re using a KITL connection. They are only a single example of how you can use these two tools together. You can use them in a variety of ways — instructions that are not essential are noted. Look up the documentation for the Monte Carlo profiler and CeLog for more information.
- First you must build a profiling-enabled OS image. No other specific components or build are required. [NOTE: this means that ONLY an OEM can use these tools because only an OEM can change the image.] Your app does not need to be built any differently than normal. On the command line:
- Decide how you’re going to start CeLog. There are a bunch of ways to do it. You can cause CeLog to be started automatically when your device boots:
Or, alternatively, you can start CeLog after the fact on an already running system, by launching celogflush.exe which background-flushes the log periodically.
- Boot your device as normal.
- Start CeLog if you did not set it up to run at boot. Zone settings do not matter (set zones to 0 if you wish to minimize the log file size).
- Start the profiler in CeLog mode. When the profiler starts, it will change the CeLog zone settings to prevent other non-profiling events from being stored in the log.
- Option 1: Type “
prof on -l” in the Target Control Window, to start profiling in CeLog mode.
- Option 2: Hit the F8 key until the profiler is toggled to CeLog mode. The current output mode will be printed to debug output. Hitting this key does not start the profiler. Start the Monte Carlo profiler by hitting the F9 key.
- Option 3: Control the profiler from an application using the ProfileStart API. Make sure to use the new PROFILE_CELOG flag with ProfileStart.
- Option 1: Type “
- Run your test application.
- Stop the profiler. You will get some debug output, but the profiler data is in the CeLog stream, so you won’t see profiler results in the debug output. When the profiler stops, it will restore the original CeLog zone settings that were active before profiling was started.
- Option 1: Type “prof off” in the Target Control window.
- Option 2: Hit the F12 key to stop the profiler.
- Option 3: Stop the profiler from within an application by using the ProfileStop API.
- Option 1: Type “prof off” in the Target Control window.
- Now that you’re done gathering data, go to your release directory. The data will be stored in celog.clg. (Note that the flush app may take a few seconds to flush the data to the file.) Copy or rename the file, or disconnect PB, so that the file is no longer receiving new data. Run perfalyzer.exe on the file:
You should run the tool on the log file while it is in your release directory, because the tool uses the .map files inside your release directory to look up symbols. Otherwise you can use the “-map” parameter to tell it where to find the .map files. The output will be placed into a \Perf subdirectory inside the current directory. Open up the overall.htm file to see the results.
- The thread named “0x00000000” is a false thread that represents the processor being idle. You can look at its hits to see a little bit of “jitter” in the thread logging, because the profiler catches some samples where the scheduler is between thread switches.
- Totals are printed at the bottom of each list of hits. Percentages should add up to 100%.
- CeLog can be used to collect Monte Carlo data as well as ObjectCall data. It does not collect KCall data. To profile KCalls, make sure the profiler is not in CeLog mode.
- Perfalyzer can display a maximum of ~200,000 profiler samples. Beyond that it loses all samples. This means that you may need to use the “
readlog -copy -range” utility to break the log file into smaller pieces. This is one of those things I’d fix if we were to make it a “real” supported tool.
- Perfalyzer does not currently handle having multiple starts and stops in a single log file. It will print a warning, and you can use the “
perfalyzer -splitlog” utility to break out a new log with the second profiler run in it.
- You can also use the “perfalyzer -noflush” option to make perfalyzer produce a report that excludes the samples due to the CeLogFlush.exe or OSCapture.exe flushing applications.