Posted by: Sue Loh
What use is a tool if you don’t know it’s there? One of the problems we have is getting the word out about the tools you can use to debug various kinds of problems. We honestly do try to figure out how to arrange our help documentation and the tools themselves in order to make them more discoverable. But, since my blog is also a place I can get the word out, I am taking advantage of the chance to share pointers to the tools we have for diagnosing the sources of performance problems.
Remote Kernel Tracker & CeLog
If you read my blog you are probably already familiar with these. Remote Kernel Tracker is the data display & visualization tool, CeLog is the logging engine, and there are several associated tools which go along with them. The CeLog data tells you what threads are running, when and how long. Using CeLog and its associated tools, you can understand how threads in your system behave and how they interact with each other. You can find the top time consumers and relate events from your own code to the threads that were running. Most of the features of CeLog/KT are usable by ISVs as well as OEMs, and all the tools are in our SDK.
- I really tried hard to brain-dump everything I could into documentation, which is now online: http://msdn.microsoft.com/library/en-us/wcedebug5/html/wce50conEventTracking.asp
- I wrote an intro with screen shots and put them on my old blog: http://blogs.msdn.com/sloh/archive/category/4902.aspx
- And some more on this blog: http://blogs.msdn.com/ce_base/archive/category/10310.aspx
Remote Call Profiler
The Remote Call Profiler, also known as “CE CAP” (CAP=Call Attributed Profiler), is an application profiling tool. The way it works is that you have to build your code with special flags to insert “instrumentation,” hooks into the profiler data collector. The hooks record every function entry and exit in the instrumented code. The hooks record the time each function starts and stops. The hooks can also detect nested function calls, so they can record data about a whole call graph instead of just the individual functions. But the logging calls can distort the results somewhat: for example the overhead of making the logging calls affects short-running functions much more than long-running ones. Remote Call Profiler is usable by anyone, but since Microsoft does not ship instrumented versions of the Microsoft code, typically it is more useful to ISVs (who are interested in their own application) than OEMs (who are interested in the whole system).
The kernel profiler, also known as the “Monte Carlo Profiler,” is a system-level profiling tool. The way it works is that it sets up a periodic interrupt, and during each interrupt it records the current value of the program counter. It accumulates a large number of hits, which statistically represent where the time is being spent. Since it could miss short function calls or activity happening at the same frequency as the profiler interrupt, its results could potentially be inaccurate. But in practice its results are quite accurate. However the kernel profiler requires OAL support, a specially built version of the kernel, and it requires all of the modules being profiled to be stored in ROM. So it is usable only by OEMs. Oh also for some complicated reasons it’s not usable on Windows Mobile. Maybe I will blog more about that soon.
Remote Performance Monitor
I listed this tool because some people would expect it to be here. But actually I have never really found the Remote Performance Monitor to be very useful for much. It is a port of the Windows tool that you can find in Control Panel / Administrative Tools / Performance. I actually believe this tool has a fair amount of potential. But there aren’t enough counters, the counters we have are not very accurate, it’s hard to figure out how to write new counters, and the tool overall seems to affect system performance a lot itself. But I hope to change that in the future. Anyway it is usable both to ISVs and OEMs.
Intel makes a statistical profiler called VTUNE that is similar to the kernel profiler. I haven’t used it much, but it has some benefits that the kernel profiler doesn’t. It can get statistics down to the line of source instead of just to the function level. It doesn’t have the kernel profiler’s requirement that the code be in ROM. I expect it’ll work with Windows Mobile, though I haven’t confirmed that. It can also tie into the hardware performance counters on Intel CPUs. (More about those below)
AMD has a similar product called CodeAnalyst which I have never used, but I think it is very similar to VTUNE.
There are classes of performance problems that CeLog and profilers cannot diagnose. One notable problem is poor cache and TLB efficiency. If you take a cache miss or a TLB miss there’s no way the OS can measure that**. For information at that level you need help from the CPU. Many CPUs have some registers that can record statistics such as memory accesses and cache misses, which can be used together to determine the cache efficiency. Some of those CPUs can also generate interrupts on counter overflow, which means you could build a profiler to catch heavy sources of the cache and TLB misses. However the registers vary from chip to chip, even from the same vendor. For example the Intel PXA250 and IXP425, both ARM chips, have different hardware performance counters. So to take advantage of them you’d have to grub through manuals and write CPU-specific code. It is possible if you have the heart for it. Otherwise in some cases you can take advantage of profilers from the CPU vendor — for example Intel’s VTUNE profiler can make use of hardware performance counters.
**Actually SHx and MIPS which use software TLB miss handlers could measure TLB misses, and indeed on MIPS the kernel profiler takes advantage of that to report TLB miss timings. On SH the profiler only reports miss counts, not timings.
PerfMan (Windows Mobile 5 only)
PerfMan isn’t really a separate tool from those listed above. It’s a UI for controlling CeLog and the kernel profiler, that is included with Windows Mobile. There is a perf package that you can include in your image if you set IMGPERF=1, which includes the PerfMan UI. PerfMan is perfect if you need to collect data on devices that don’t have a KITL connection.
I’ll write about memory tools too but this is enough for tonight.
Update Feb. 28, 2006: I just learned of a whole set of information on performance counters for managed code. See http://blogs.msdn.com/davidklinems/archive/2005/12/09/502125.aspx.
Update Nov. 30, 2006: I also just learned of another 3rd party profiler, Speed Demon. http://www.noctemware.com/speeddemon.html