Using MAP files - part 1

jbroxson@microsoft.com

 

Back in February, the Doctor talked about manually unwinding stacks.  MAP files are a great tool to help when you are doing this, and they are your best tool for resolving the code addresses found on the stack.  I was recently working on an issue that required converting a lot of unresolved stacks and trying to pin down what was happening on several devices.  Doing this reminded me of all the tricks and the Doctor and I decided we should document it for everyone.

 

In part one we’re going to talk about what a MAP file is.  In part two, we’ll talk about how you can use this information for resolving stacks.

 

Today, symbol files contain a lot of information, including some extras like source line information, etc.  To get pretty much ALL of that data in human readable form, you would need both MAP and COD files.  Most people don’t want COD files getting out because it contains their source code.  MAP files are just the basics – offsets for functions, globals, and other data.  It’s plenty of information for what we need to do – resolve a stack – and you should already have them in your flat release directory.

 

I'm going to use COREDLL as an example (I’ve randomly trimmed this a lot). Highlighted text will be discussed after the example:

 

Coredll <<<< the module name

 

 Timestamp is d600142f <<<< timestamp when it was built

 

 Preferred load address is 10000000 <<<< Where the module wants to load. Don’t trust this!

 

>>>> Begin information about the sections in this module.

 Start Length Name Class

 0001:00000000 00005c44H .rdata CODE

 0001:00005c44 00000024H .rdata$debug CODE

 0001:00005c68 000003c4H .rdata$r CODE

 0001:0000602c 00065338H .text CODE

 0001:0006c1f0 0000c36eH .edata CODE

 0002:00000000 00000004H .CRT$XCA DATA

 0002:00000004 00000004H .CRT$XCAA DATA

 0002:00000028 000009acH .data DATA

 0002:000009e0 00000384H .bss DATA

 0003:00000000 00005310H .pdata DATA

 0004:00000000 000002b0H .rsrc$01 DATA

 

>>>> Begin actual symbolic information

  Address Publics by Value Rva+Base Lib:Object

 

 0000:00000000 ___safe_se_handler_count 00000000 <absolute>

 0000:00000000 ___safe_se_handler_table 00000000 <absolute>

 0001:00000030 ??_7exception@std@@6B@ 10001030 coredll_ALL:stdexcpt.obj

 0001:00000174 ??_C@_17LDADEION@?$AAI?$AAM?$AAE?$AA?$AA@ 10001174 coredll_ALL:Imm.obj

 0001:0000017c ??_C@_13COJANIEC@?$AA0?$AA?$AA@ 1000117c coredll_ALL:Imm.obj

 0001:00000180 ??_C@_15KNBIKKIN@?$AA?$CF?$AAd?$AA?$AA@ 10001180 coredll_ALL:Imm.obj

 0001:00000434 ??_7logic_error@std@@6B@ 10001434 coredll_ALL:string.obj

>>>> Ok, enough of that. It’s ugly and not relevant to us. Further in, we start seeing the following:

 

 0001:00000960 cszTimeZones 10001960 coredll_ALL:time.obj

 0001:00000978 NormalYearDaysBeforeMonth 10001978 coredll_ALL:time.obj

 0001:00000994 LeapYearDaysBeforeMonth 10001994 coredll_ALL:time.obj

 0001:000009b0 NormalYearDayToMonth 100019b0 coredll_ALL:time.obj

 0001:00000b20 LeapYearDayToMonth 10001b20 coredll_ALL:time.obj

<<<< Those are global variables. I’ll tell you how I knew that soon.

 

>>>> Now things get interesting. Notice that in the 4th column there is an “f”? That stands for function. (that’s how I knew the ones above were globals... no “f”)

 0001:0000602c mbstowcs 1000702c f coredll_ALL:coredll.obj

 0001:00006114 wcstombs 10007114 f coredll_ALL:coredll.obj

 0001:00006210 RegisterDlgClass 10007210 f coredll_ALL:coredll.obj

 0001:000062a4 CoreDllInit 100072a4 f coredll_ALL:coredll.obj

  

>>>> More functions, but they look kinda funny. That’s because they are “decorated” or “mangled”. More on this in part two.

 0001:00006338 ??0exception@std@@QAA@XZ 10007338 f coredll_ALL:stdexcpt.obj

 0001:00006354 ??0exception@std@@QAA@PBD@Z 10007354 f coredll_ALL:stdexcpt.obj

 0001:000063a0 ??0exception@std@@QAA@ABV01@@Z 100073a0 f coredll_ALL:stdexcpt.obj

 0001:00006404 ??1exception@std@@UAA@XZ 10007404 f coredll_ALL:stdexcpt.obj

  

I said above that you should not trust the “Preferred load address”.  There is a simple reason for this.  It is preferred.  If I do a findstr on MAP files in one of my flat release directories for the string “Preferred load address is 10000000”, I get back 582 hits. We know that all those files aren’t loading at the same address. Some binaries will be rebased to a specific pre-determined address, others will be dynamically rebased when they are loaded. In part two, I will talk about ways to determine the true load address for a module.

 

Above you will also notice the addresses in the first column that look like this: 0001:00005c44. This is a segmented address. Without going into too much explanation, a segmented address is an offset relative to a segment written in the form segment:offset. In the this case, the address represents an offset of 0x5c44 bytes into segment number 0x0001. So, if segment 0x0001 began at 0x01000000 this would represent a relative address of 0x01005c44.

 

Another term above that should be defined is, RVA+Base. RVA is the Relative Virtual Address. Base is the Preferred Load Address I already told you about. So, let’s look at the following line:

 

Address Publics by Value Rva+Base Lib:Object

0001:00006114 wcstombs 10007114 f coredll_ALL:coredll.obj

 

Here, the RVA+Base is 0x10007114. We know from the top of the file that the Preferred load address is 0x10000000. Subtracting that, we are left with 0x7114. This is the offset into the file to the beginning of the wcstombs function. We can learn something else here as well. The segmented address is 0001:00006114. If we subtract 0x6114 from 0x7114 we get 0x1000. Why does this difference exist? Well, 0001:00006114 is segmented. It is an offset into segment 0x0001. We now know that segment 0x0001 begins at 0x1000 – 4k into the file.

 

Ok. So now you’ve seen a MAP file, and have a couple of clues about what’s in it. In part two, we will talk about using this information to resolve call stacks.