How do I determine the processor’s cache line size?


When doing high-performance computing, you need to worry about the CPU cache line size in order to avoid issues like false sharing. But how can you determine the processor's cache size?

The GetLogicalProcessorInformation function will give you characteristics of the logical processors in use by the system. You can walk the SYSTEM_LOGICAL_PROCESSOR_INFORMATION returned by the function looking for entries of type RelationCache. Each such entry contains a ProcessorMask which tells you which processor(s) the entry applies to, and in the CACHE_DESCRIPTOR, it tells you what type of cache is being described and how big the cache line is for that cache.

Windows 7 adds the function GetLogicalProcessorInformationEx which does the RelationCache filtering for you.

Comments (40)
  1. John says:

    I was going to make a joke about Vista requiring a super-computer for adequate performance, but I decided against it.  Vista is dead; long live Windows 7.

  2. What are the advantages of this method over using the CPUID instruction, which is supported on any OS running on an x86 chip from the 486 and later?

    [And if you’re not running on an x86 chip? -Raymond]
  3. Wojciech Gebczyk says:

    "When (…), you need to worry about the CPU cache line size(…)".

    Reminds me of someone that has been tuning HTML table content (those silly tags and its content) to reduce processor temperature/usage while rendering it.

    Are you sure, that your task’s queue is not empty while writing such low level hints? If it’s empty, you can always work on Visual Studio performance or other silly task…

    ;>

  4. GWO says:

    And if you’re not running on an x86 chip?

    Then the chances are very slim that you’re running Windows, so its a moot point.

    [Some of us develop for non-x86 Windows. I suspect a lot of people reading this Web site are not running x86 Windows. -Raymond]
  5. Andreas F, says:

    since this will come up:

    an example, that could use this is a prefetching, memory resident B+ Tree.

  6. Alexandre Grigoriev says:

    [I suspect a lot of people reading this Web site are not running x86 Windows. -Raymond]

    Well, x64 architecture support CPUID instruction, too… And IA64 is deadbeef…

  7. Alexandre Grigoriev says:

    For what it’s worth, by optimizing structure layout in my storport miniport (decouplind processor-specific data from globally used, forcing cache line alignment) I was able to reduce CPU consumption per command by some 5%. Now, I’m almost hitting the OS limit (which I measured by short-circuiting the requests).

  8. dave says:

    What are the advantages of this method over using the CPUID instruction, which is supported on any OS running

    Does the OS really "support" the instruction?

    I suspect the actual situation is that the OS has no interest in your use of the instruction, successful or otherwise.

    But in any case, the advantage of using an OS-provided API rather than an architecture-specific instruction is that, you know, you get to be independent of processor architecture.

  9. Philip says:

    If you don’t want to bother being architecture-independent, you might as well just write:

    #define CACHE_LINE_SIZE 64

    Should be good for modern x86s and saves all this fiddling around with function calls and correctness.

  10. Joe says:

    Windows mobile people!

  11. Zan Lynx says:

    IA64 may be dead’ish. (I don’t agree, I still love the arch but I don’t run Windows on it.)

    But low power computing in the form of netbooks is becoming very popular. Unless Microsoft wants to give that market to Linux they may need to release an ARM Windows-7 at some point.

    You want your application to work without rebuilding a lot of custom x86 code, right? Even if MS does binary recompiling from x86 to ARM (which I think they’d have to for back compatibility reasons) your application would run better in native machine code.

  12. 640k says:

    GetLogicalProcessorInformation are not supported on windows mobile.

  13. Cooney says:

    [Some of us develop for non-x86 Windows. I suspect a lot of people reading this Web site are not running x86 Windows. -Raymond]

    I would wager that non-x86 and high performance computing are a very small slice of the world, especially when you’re talking about windows: the top 500 list is dominated by x86 linux (not to start a flamewar here), so your best options for HPC are likely to be x86.

    Then again, maybe you’re doing a codec that has to run on a $0.50 part – now it isn’t HPC, but LPC – squeeze acceptable performance out of something slow as balls.

    [I wasn’t responding to the HPC part of the statement, just the “Why bother? Let’s just hard-code x86 since that’s all anybody uses anyway” part. -Raymond]
  14. Leo Davidson says:

    "Unless Microsoft wants to give that market to Linux they may need to release an ARM Windows-7 at some point."

    I hope ATOM improves enough to make ARM not worth it, then.

    I just got away from having to target ANSI and Unicode only to have it replaced by 32-bit and 64-bit. If ARM and non-ARM builds are added on top of that then I’m going to strangle a kitten.

    ARM Debug

    ARM Release

    x86 Debug

    x86 Release

    x64 Debug

    x64 Release

    ^^^ No thanks!

  15. Owen S says:

    "I hope ATOM improves enough to make ARM not worth it, then."

    I hope the other way. I hope Atom continues sucking enough to allow ARM to claw it’s way to higher performance environments.

    Then the abomination called x86 can be strangled. Like it should have been. In 1979.

  16. Michael Fuller says:

    you need to worry about the CPU cache line size in order to avoid issues like false sharing.

    Isn’t this information more useful

    at compile time e.g. to pad data structures to avoid cache line ping-pong, rather than at runtime?

    How can it be used at run time to avoid false sharing?

  17. Zan Lynx says:

    You might be writing a JITter or a shared library like pthreads or you are dynamically allocating memory for thread work queues.

  18. Daniel ZY says:

    Debugging on 64 bit with Visual Studio is easy enough. just copy the 64 bit remote debugger stub to the remote system and run it.

  19. Jonathan Wilson says:

    Another thing that annoys me is the number of OEMs shipping machines with top of the line CPUs with x86-64 support and yet install the 32 bit version of Windows. If its got a 64 bit CPU, why would you not use the 64 bit version of Windows?

  20. cooney says:

    MIcrosoft doesn’t even release most of it’s software for 64 bit! Visual Studio is still a 32 bit app. MS Office will be 64 bit before Visual Studio.

    Do you really need VC++ to address 4G? Is there a limitation when debugging 64 bit apps from a 32 bit app?

  21. Nick says:

    "Do you really need VC++ to address 4G?"

    Ha, I just had a vision of times gone past…

    (push grayscale filter)

    "Do you really need BASIC to address more than 65MB?"

    (pop grayscale filter)

    There is a lot more reason to migrating applications to the native architecture besides having an n-bit address space. Emulation slows things down and makes interop harder. x64 provides more registers. Etc.

    I’m not exactly in the x64-or-bust club, but it will be nice when I can have a 64-bit system where all my normal applications run natively.

  22. I have a sneaking suspicion that Windows 7 is just what Vista was supposed to be (but to make the deadline they shipped Windows 6.x and labelled it Vista). Any inside info Raymond (that won’t get you in trouble :)

    [If that were true, then I wasted all that time working on Windows 7 specs – I could’ve just taken the leftover ones from Windows Vista! You can see for yourself what Windows Vista was supposed to be: Just check out PDC 2003. -Raymond]
  23. Lawrence says:

    Nick, how exactly is Visual Studio EXE being 32bit impacting you?

    It’s not limiting YOU to creating 32bit applications, and it interops just fine with the other applications on my 64bit system.

    As for it ‘slowing things down’ – I suspect it will be some time before a 64bit Visual Studio is able to show a performance improvement over the 32bit version. [In an heroic attempt to drag the conversation back on topic] I suggest VC++ has been heavily optimised over the years to compile fastest with assumptions around like CPU cache line size, which no longer apply when they flick the 64bit switch.

  24. 640k says:

    Debugging 64-bit apps is not as easy as debugging 32-bit apps. Ofcourse it’s easy if you have all the time in the world. But it shouldn’t have to be harder than setting a breakpoint + F5. If you want it to be hard and time consuming to do the job, you should run linux. Cross compiling + debugging can never be as smooth as a native solution. There are a LOT of features missing because of VS isn’t a native 64-bit app. Edit & continue to name one. But show stoppers are usually interop with other programs which is only 64-bit (IIS). And how are people supposed to develop 64-bit apps when ms itself cannot develop the most basic development tool to begin with? They should eat their own dog food, and has hopefully done so before 20100322.

  25. Cooney says:

    There is a lot more reason to migrating applications to the native architecture besides having an n-bit address space. Emulation slows things down and makes interop harder. x64 provides more registers. Etc.

    I’m not actually familiar with winx64 – i’m coming from a world where you can just run a x32 binary on a x64 OS – no thunks or emulation. Basically, unless you need to access past 4G (which may be required, but ony from the debugger), you tend to keep the app as x32.

    Basically, I expect x32 office to just work on x64 windows and I’m not clear when office would ever need x64 addressing

    And how are people supposed to develop 64-bit apps when ms itself cannot develop the most basic development tool to begin with?

    this is actually shocking – we have x64 builds for our C++ app that work fine, so i assumed that x64 vc++ was already done. We’ve had x64 OSes for going on 5 years, so shouldn’t VC++ be a first class thing?

  26. GWO says:

    Raymond: I suspect a lot of people reading this Web site are not running x86 Windows.

    That doesn’t support CPUID instruction?  I doubt that.

  27. Karellen says:

    @Leo – What?!? Your code will have to compile and run on *3* whole architectures, including x86-32 and x86-64 separately? No mere human could possibly be expected to maintain a codebase so complex!

    :-)

  28. Kovensky says:

    Two other interesting cache related articles from one of the x264 developers:

    http://x264dev.multimedia.cx/?p=149

    http://x264dev.multimedia.cx/?p=201

    PS: I have seen assembly programmers complaining several times about the Win64 ABI being way more complicated than it should be, and I suppose it also slows things down compared to the x86_64 ABI other OSes use.

  29. Aram Hăvărneanu says:

    I don’t know anyone running Windows on non IA32(e) systems. The only time I’ve seen Windows on non-IA32 is at vendors that try to sell you non-IA32 machines. I don’t know anybody running Windows on Itanium in production. I’m testing my software on Itanium, and even some of my drivers, but I do it because portable code == better code, not because I actually had a customer that wanted Itanium.

    It’s a shame, really. It’s also a shame Microsoft dropped client Windows for non-IA32 machines (Windows XP 64 bit (Itanium) was canceled). It’s also a shame there are NO non-IA32 *workstations* on the market. How do I write software for Itanium if I can’t buy an Itanium workstation to develop on?

    It’s also a shame Microsoft doesn’t release anything besides server windows and SQL server on non-IA32 platforms. No Visual Studio, no Office — nothing. The architecture is sterile. Worse, MIcrosoft doesn’t even release most of it’s software for 64 bit! Visual Studio is still a 32 bit app. MS Office will be 64 bit before Visual Studio.

  30. ulric says:

    The thread about CPUID is kind of obvious, because if you need to know the cache line size precisly, you probably are using CPUID id already.  If you’re looking for an API for this, you probably do should not worry about it.  I like Philip’s #define suggestion. :P

    @640k Our team (about 40) works on a large 64-bit C++ apps with VC++.  It works well (in fact the performance is great)  Afaik, the lack of edit-and-continue in 64-bit is not related IDE being 32-bit and can’t do that; it’s not there because they didn’t port the code to patch up a 64-bit running process.  What is not working right that is cause by the IDE being 32-bit?   You can download using WinDbg native 64-bit to figure out if using a native 64-bit debugger helps.  (For editing and compiling, obviously it’s irrelevant) I’m not really aware of anything except one crash exception that isn’t catched by default.

  31. Paul says:

    Rico Mariani explained why VS is still only 32bit on his blog a while ago (http://blogs.msdn.com/ricom/archive/2009/06/10/visual-studio-why-is-there-no-64-bit-version.aspx)

    @640K : So you really think Visual Studio is "the most basic development tool"? I would hate to see your list of required features for a comprehensive development tool.

  32. Alexandre Grigoriev says:

    @Karellen,

    Are you really sure that some max 20MB of additional runtime libraries really stress memory load? How is that different from 20MB of data?

  33. Gabe says:

    Remember, when you port your app from 32-bit to 64-bit, the size of pointers doubles but the processor’s cache line size stays the same. This means that your app will run SLOWER as 64-bit — unless you actually use more than 4GB of address space.

    In fact, if you app doesn’t need more than 4GB of address space, its fastest configuration would likely be as a 32-bit app on a 64-bit OS. It can use all 4GB of the 32-bit address space and still have all kinds of files cached in the several GB of memory that the 64-bit OS can directly address.

    The ONLY time you would want to compile your app for 64-bit is if it needs binary compatibility with other 64-bit systems (plug-ins, etc.) or it needs access to more than 4GB of address space. The reason you don’t see a 64-bit Visual Studio is that it would run slower than a 32-bit version and need all new plug-ins.

    If your app is taking up so much memory that you need 64 bits, try going on a diet before buying a new wardrobe!

  34. DriverDude says:

    "Another thing that annoys me is the number of OEMs shipping machines with top of the line CPUs with x86-64 support and yet install the 32 bit version of Windows."

    64-bit Windows runs 32-bit apps, but 64-bit apps do not run 32-bit plugins. That means the default IE on x64 does not run Flash. That’s a tech support nightmare for OEMs.

    There are parts of Office that do not run on x64, see http://support.microsoft.com/kb/927383

  35. Karellen says:

    32-bit apps on a 64-bit system cause slowdowns due to increased memory pressure. If you’re running 64-bit only, you only need one copy of all the shared libraries, including system ones, in memory at once, shared between all the processes that use them. Different processes may need a few private pages of reloction info for each DLL, but that’s tiny compared to the size of the DLL itself.

    If you launch a 32-bit app, not only do you need the private code for that app, but 32-bit versions of *all* the shared libraries used by that app on the system need to be loaded *again*, alongside the "real" versions for your preferred architecture. That’s the C library, graphics libraries, support libraries, etc… – basically MB upon MB of code that is *already* loaded on your system, only in a more "native" manner.

    More memory pressure implies more swapping implies less performance.

    Further, all those libraries are also twice as much code to keep patched for security updates, there are twice as many updates to download. Basically, all the advantages you get from using shared libraries are suddenly halved by running a dual-arch system.

    Yes, if you’ve got a proprietary 32-bit app that the vendor is too lazy, incompetent or plain out-of-business to recompile/release as 64-bit, then a multi-arch system may be an undesirable necessity. But it’s not a situation anyone should be *happy* about being in.

  36. Paul says:

    @DriverDude: I am running Windows 7 x64 and I am fairly sure it uses the 32 bit version of IE by default anyway.

  37. Karellen says:

    "Are you really sure that some max 20MB of additional runtime libraries"

    Hmmm….I thought it would be more than that, but looking on my system the large apps are only using about 20-30Mb of shared memory. Still, yes, if I could reduce a large app’s footprint by 20Mb, even though it’s only 0.5% of my RAM, I wouldn’t say no to that.

    "Remember, when you port your app from 32-bit to 64-bit, the size of pointers doubles but the processor’s cache line size stays the same. This means that your app will run SLOWER as 64-bit."

    Mmmmm….., but OTOH there are more registers on x86-64, so machine code is generally cleaner and more straightforward with fewer redundant loads/saves shuffling temporaries around. Which takes time and space. Certainly not as much time as an instruction cache miss, and probably not as much space as the extra memory for pointer data, but it’s not quite as simple as "the pointers are wider" when you get down that low.

    And it’s nice to be able to mmap any file on the file system without having to think about it. Instead of having two paths; the mmap version which is fast and efficient, and the slower open/read/close which copies every single byte, and is therefore slower, but only needs to be used on large files where you could really use the performance.

  38. ulric says:

    lots of myth about 64-bit here…

    64-bit apps do run slower, because they take up more RAM.  The additional registers make no difference, the bottleneck is in cache and memory access.  It’s not possible to run a 64-bit Windows where there are no 32-bit apps or processes anywhere, so don’t sweat about running 32-bits on it : the 32-bit shared libraries have been loaded long ago. :)

    @Paul really good link, thanks!

  39. Aram Hăvărneanu says:

    ulric said: "It’s not possible to run a 64-bit Windows where there are no 32-bit apps or processes anywhere, so don’t sweat about running 32-bits on it : the 32-bit shared libraries have been loaded long ago."

    Of yes it is. On Windows Server 2008 R2 at least. You can have a pure 64 bit OS.

  40. Karellen says:

    @Ulric – I wasn’t just talking about Windows; I was talking about the general merits of running single-arch vs. multi-arch systems. I run some (non-Windows) pure 64-bit systems quite happily thanks, and am loathe to install dozens of megs of 32-bit support libraries (libraries installed are much greater than the shared memory used, although disk space is more plentiful than RAM, you do still have the issues of extra security updates, etc…) for one or two apps that the proprietary vendor *still* can’t get right, despite 64-bit systems being available for many years now.

Comments are closed.