Using GetLogicalProcessorInformationEx to see the relationship between logical and physical processors


Today's Little Program uses the Get­Logical­Processor­Information­Ex function to print the mapping of logical processors to physical processors, as well as the mapping of logical processors to packages. (A dual-core processor is a single package with two cores. If those cores are themselves dual-hyperthreaded, then you have four logical processors total.)

#define STRICT
#include <windows.h>
#include <stdio.h>

template<typename T>
T *AdvanceBytes(T *p, SIZE_T cb)
{
 return reinterpret_cast<T*>(reinterpret_cast<BYTE *>(p) + cb);
}

The Advance­Bytes helper function takes a typed pointer and adds a byte offset to it. This is just a typing-saver function.

class EnumLogicalProcessorInformation
{
public:
 EnumLogicalProcessorInformation(LOGICAL_PROCESSOR_RELATIONSHIP Relationship)
  : m_pinfoBase(nullptr), m_pinfoCurrent(nullptr), m_cbRemaining(0)
 {
  DWORD cb = 0;
  if (GetLogicalProcessorInformationEx(Relationship,
                                       nullptr, &cb)) return;
  if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) return;

  m_pinfoBase =
   reinterpret_cast<SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *>
                                     (LocalAlloc(LMEM_FIXED, cb));
  if (!m_pinfoBase) return;

  if (!GetLogicalProcessorInformationEx(Relationship, 
                                        m_pinfoBase, &cb)) return;

  m_pinfoCurrent = m_pinfoBase;
  m_cbRemaining = cb;
 }

 ~EnumLogicalProcessorInformation() { LocalFree(m_pinfoBase); }

 void MoveNext()
 {
  if (m_pinfoCurrent) {
   m_cbRemaining -= m_pinfoCurrent->Size;
   if (m_cbRemaining) {
    m_pinfoCurrent = AdvanceBytes(m_pinfoCurrent,
                                  m_pinfoCurrent->Size);
   } else {
    m_pinfoCurrent = nullptr;
   }
  }
 }

 SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *Current()
                                         { return m_pinfoCurrent; }

private:
 SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *m_pinfoBase;
 SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *m_pinfoCurrent;
 DWORD m_cbRemaining;
};

Enumerating logical processor information is complicated due to the variable-size structures, so I wrap it inside this helper enumerator class.

Construct it with the relationship you are interested in, then use Current() to see the current item and Move­Next() to move to the next item. When there are no more items, Current() returns nullptr.

The constructor does the standard two-step query we've seen before: Ask for the required buffer size, then allocate a buffer, then ask for the buffer to be filled in. There is a TOCTTOU race condition if a processor is added dynamically, but I'm going to ignore that case because this is a Little Program.

Since the SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX structure is variable-sized, walking the packed array is not a simple array indexing operation. Instead, you have to bump the pointer by the Size of the current element to find the next element.

Next comes a helper function to print processor affinity bitmasks.

void PrintMask(KAFFINITY Mask)
{
 printf(" [");
 for (int i = 0; i < sizeof(Mask) * 8; i++) {
  if (Mask & (static_cast<KAFFINITY>(1) << i)) {
   printf(" %d", i);
  }
 }
 printf(" ]");
}

Nothing exciting there.

Finally, we wrap it up inside a sample program that enumerates the cores and then, just for fun, enumerates the packages.

int __cdecl main(int argc, char **argv)
{
 for (EnumLogicalProcessorInformation enumInfo(RelationProcessorCore);
      auto pinfo = enumInfo.Current(); enumInfo.MoveNext()) {
   PrintMask(pinfo->Processor.GroupMask[0].Mask);
   printf("\n");
 }

 for (EnumLogicalProcessorInformation enumInfo(RelationProcessorPackage);
      auto pinfo = enumInfo.Current(); enumInfo.MoveNext()) {
   printf("[");
   for (UINT GroupIndex = 0; GroupIndex < pinfo->Processor.GroupCount; GroupIndex++) {
    PrintMask(pinfo->Processor.GroupMask[GroupIndex].Mask);
   }
   printf(" ]\n");
 }

 return 0;
}

Enumerating processor cores produces a bunch of PROCESSOR_RELATIONSHIP structures, each with a single group that describes the logical processors assigned to the core.

Enumerating processor packages produces a bunch of PROCESSOR_RELATIONSHIP structures, and each one contains as many groups as there are cores in the package.

Bonus chatter: The CoreInfo utility from Sysinternals is a command-line tool that is a fancier version of this Little Program.

Comments (13)
  1. Chuck says:

    Raymond, you read my mind, I was studying this exact thing a few weeks ago and decided to settle with using the CoreInfo utility. Thank you for your post!

  2. jonwil says:

    Why would windows (and apps for that matter) need to care about cores vs packages? How is a dual core CPU different from 2 single core CPUs?

  3. Mark Sowul says:

    @jonwil – two examples off the top of my head:

    1) NUMA (there are other APIs for this, but it is an extreme case of dual-core being different from two single cores)

    2) Licensing (per-core vs. per-package)

  4. Mike S says:

    The processors in a dual core chip often share caches, making it cheaper to move a process from one to another on the same chip, rather than move it to another one.

  5. alegr1 says:

    @jonvil:

    You want to keep a process within one package (node), to reduce cache coherency traffic between sockets.

  6. Adam Rosenfield says:

    I don't think I've ever seen code before that declared a variable in the condition of a for loop like this, though it's perfectly valid:

    for ( …; auto pinfo = enumInfo.Current(); …) { … }

  7. alegr1 says:

    @Adam Rosenfeld:

    Same as in while() condition.

  8. Thanks a lot. Wikipedia could certainly use this post.

    Although the "Little Program" with capital L and capital P scares me a little.

  9. saveddijon says:

    @alegr1:

    Keeping all processes in one node doesn't always help you – unless you can keep their memory on that node as well.

    On an AMD dual-socket system (or single-socket MCM such as Magny-Cours) each die has its own DDR controller, and the cache coherency controller to go with it. If you have processes on one die executing against memory hosted by the other die then you will still have lots of inter-die traffic simply managing the cache probes, even if no data ever moves around.

  10. Neil says:

    I once had to deal with a network provider which wouldn't bother to calculate the total buffer size, instead it just told you how much it tried to allocate when it found you hadn't provided enough. So if you provided no buffer, it would say you didn't even have enough for the basic structure, and only when you provided that did it admit that you needed a bigger buffer to cover the optional fields.

  11. Mal DeMer says:

    @Adam Rosenfield: You need to get out more. :) This is perfectly legit, and nice because it limits the scope of the loop variable to that of the loop.

  12. alegr1 says:

    @saveddijon:

    Memory affinity doesn't have anything to do with cache coherency.

    "Near" and "far" memory only matters when you need to fetch data not in the local cache. But even with the "near" memory, the requesting CPU still needs to broadcast a request to all nodes, to see if it's in other socket's cache.

  13. saveddijon says:

    @alegr1:

    Not fully true. If the coherency controller supports a cache directory (AMD calls it "probe filter") then you do not necessarily need to broadcast to all nodes. The coherency controller already knows who has the data, and if the coherency controller for the memory is local to "your" die then you save a bit of I/O.

    This is a big win for precisely the multi-die case, where you want to avoid all unnecessary inter-die communication.

Comments are closed.

Skip to main content