Measuring TLB misses


Posted by: Sue Loh

 

Question: How can I measure TLB misses on Windows CE? 

Answer:

 

If you are running a MIPS or SH processor then CeLog will capture counts of TLB misses.  On each thread switch it’ll record a “TLB miss” event which has the count of TLB misses that happened during the time since the last thread switch.  That’s a lot less informative than you might be hoping for — it won’t tell you the exact times of the TLB misses, or what addresses they happened on.  You’d have to manually instrument the kernel’s TLB miss handler to get that.  And Kernel Tracker doesn’t really do a good job of displaying TLB misses.  You’d be better off running readlog.exe.  If you use the “-s -v” parameters to get a verbose summary, it will add up TLB misses per thread, for example:

 


0x0002E942  0x00013242  0x00013242  0:00:00.099.172  Active   shell.exe


        Time in process 0x00013242: 0:00:00.099.172  (100.0% of thread run-time)


                     Total TLB Misses:      500


                Total thread switches:       25


  min TLB misses in one thread switch:        2


  max TLB misses in one thread switch:       35


                    TLB Misses/switch:       20.00


                   Total run time, ms:       99


                        TLB Misses/ms:        5.05



On ARM and x86, the OS has no knowledge of TLB misses, because TLB misses are handled completely in hardware.  There is no software TLB miss handler that the OS could use to record the events.  The only possible way for you to find out about TLB misses on ARM/x86 is if the CPU has hardware performance counters you can use to measure them.  Performance counters vary from one manufacturer to another and from one CPU to another, so look in the reference manual from the CPU manufacturer.  If they are supported you’d end up having to write CPU-specific code to access them.

 

Comments (6)

  1. ce_base says:

    By the way, measuring cache misses is similar — on all CPUs, they are handled entirely in hardware.  So your CPU would have to expose some hardware performance counters in order for you to measure them.  So again look in the reference manual from the CPU manufacturer.

    Sue

  2. ce_base says:

    Oh yeah!  Also, on MIPS the Monte Carlo profiler will tell you what % of profiler hits occurred during TLB miss handling.  So that you can find out what % of time is spent handling TLB misses.

    On SH it wasn’t possible to implement that, unfortunately.  I tried and concluded there was no way to do it.

    And on ARM/x86 it’s not possible since there’s no software TLB miss handling.

    Sue

  3. Alex says:

    Hi Sue,

    thanks for this informative post.

    Do you think that is possible to use the MIPS’ software TLB miss handler to expand the 512 MB (physical) RAM limit of WinCE?

    I found this article which could be the basis for that: http://support.microsoft.com/kb/841192/en-us, but I am sure of the implication for the OS itself with such change…

    Thanks,

    Alex

  4. ce_base says:

    Well I’m no expert on what kind of tricks you can play in the TLB miss handler, but I don’t think so.  It is not the TLB miss handler that is creating the 512MB limit.  It’s the way the kernel maps physical memory twice (once cached and once uncached) into a 1GB portion of the kernel virtual address space.  I don’t really know why it’s done that way but I think it’s more than a trivial change to make it act differently.  Sorry,

    Sue

  5. Sarang Padalkar says:

    >On ARM and x86, the OS has no knowledge of TLB misses, >>because TLB misses are handled completely in hardware.  

    >There is no software TLB miss handler that the OS could

    >use to record the events.  

    The ARM926 has a "Debug Override Register" that can be used

    to specify that a TLB miss should be implemented in software

    (through a data/instruction abort). The instruction is

    "MCR{cond} p15,0,<Rd>,c15,c0,0"

    Even if if doesn’t have hardware support, is it not possible to

    implement this by having an "invalid" page table entry in memory

    and recording this in the instruction/data abort exception handlers?

  6. ce_base says:

    Thank you for that tip about the ARM trick.  I don’t know when I might get to use it but it’s good to know.  Generally TLB misses are a much bigger issue on MIPS and SH since software handling is so much slower than hardware handled TLB misses.

    I don’t think we could implement it with page tables, because I don’t think we can control what entries are in the TLB.  Or, I’m not creative enough to figure out how to do it in the short time I’ve thought about it.  🙂 At most I think we could measure page faults but that’s not the same as TLB misses.

    Sue

Skip to main content