Does __fastcall make a difference for C++ classes?

We're running through a routine round of code reviews of the audio engine, and I noticed the following code (obscured):

HRESULT __fastcall CSomeClass::SomeMethod(SomeParameters);  

I looked at it a couple of times, because it seemed like it was wrong.  The thing that caught my eye was the "__fastcall" declaration.  __fastcall  is a Microsoft C++ extension that allows the compiler to put the first 2 DWORD parameters to the routine into the ECX and EDX registers (obviously it's x32 only).

But when compiling C++ code, the default calling convention is "thiscall", and in the thiscall convention, the "this" pointer is passed in the ECX register, which seems to collide with the __fastcall declaration.

So does it make a difference?  I could have left a code review comment and made the person who owned the code run through the exercise, but I figured why not figure out the answer myself?  And, to be honest, I found the path to the answer almost more interesting than the answer itself.

As I usually do in these cases, I wrote a tiny little test application to test it out:

class fctest
{
int _member;
public:
fctest::fctest(void);
fctest::~fctest(void);
int __fastcall fctest::FastcallFunction(int *param1, int *param2)
{
return *param1 * *param2;
}
int fctest::ThiscallFunction(int *param1, int *param2)
{
return *param1 * *param2;
}
};

int _tmain(int argc, _TCHAR* argv[])
{
fctest test;
    int param1, param2;
    int result;
result = test.FastcallFunction(&param1, &param2);
result = test.ThiscallFunction(&param1, &param2);
    return 0;
}
 

I compiled it for "Retail", and then I looked at the generated output.  Somewhat to my surprise, the code generated was:

main:
    xor eax, eax
    ret

Yup, the compiler had optimized out my entire program.  Crud, back to the drawing board.

Try #2:

int _tmain(int argc, _TCHAR* argv[])
{
fctest test;
    int param1, param2;
    int result;
result = test.FastcallFunction(&param1, &param2);
printf("%d: %d: %d", param1, param2, result);
result = test.ThiscallFunction(&param1, &param2);
printf("%d: %d: %d", param1, param2, result);
    return 0;
}
This one was somewhat better:

main:
    mov eax, [sp]
    imul eax, [sp+4]
    <call to printf #1>
    <call to printf #2>
    xor eax, eax
    ret

Hmm, that wasn't much of an improvement.  The compiler realized that FastcallFunction and ThiscallFunction did the same thing and not only did it inline the call, but it optimized out the 2nd call.

Try #3:

int _tmain(int argc, _TCHAR* argv[])
{
fctest test;
    int param1, param2;
    int result;
param1 = rand();
param2 = rand();
result = test.FastcallFunction(&param1, &param2);
printf("%d: %d: %d", param1, param2, result);
param1 = rand();
param2 = rand();
result = test.ThiscallFunction(&param1, &param2);
printf("%d: %d: %d", param1, param2, result);
    return 0;
}
 

Try #3's code:

main:
    call rand
    mov [sp], eax
    call rand
    mov [sp], eax
    mov eax, [sp]
    imul eax, [sp+4]
    <call to printf #1>
    call rand
    mov [sp], eax
    call rand
    mov [sp], eax
    mov eax, [sp]
    imul eax, [sp+4]
    <call to printf #2>
    xor eax, eax
    ret

Much better, now at least both functions are inlined.  But the stupid function is STILL inlined, I haven't learned anything yet.

Try #4: I moved fctest into its own source file (I'm not going to show the source code for this one).

The code for this one finally got it right:

param1 = rand();
00401029 call rand (401131h)
0040102E mov dword ptr [esp+4],eax
            param2 = rand();
00401032 call rand (401131h)
00401037 mov dword ptr [esp],eax
             result = test.FastcallFunction(&param1, &param2);
0040103A lea eax,[esp]
0040103D push eax
0040103E lea edx,[esp+8]
00401042 lea ecx,[esp+0Ch]
00401046 call fctest::FastcallFunction (4010E0h)
            printf("%d: %d: %d", param1, param2, result);
0040104B mov ecx,dword ptr [esp]
            param1 = rand();
00401062 call rand (401131h)
00401067 mov dword ptr [esp+4],eax
            param2 = rand();
0040106B call rand (401131h)
00401070 mov dword ptr [esp],eax
            result = test.ThiscallFunction(&param1, &param2);
00401073 lea eax,[esp]
00401076 push eax
00401077 lea ecx,[esp+8]
0040107B push ecx
0040107C lea ecx,[esp+10h]
00401080 call fctest::ThiscallFunction (4010F0h)

So what's in all this gobbeldygook?

Well, the relevant parts are the instructions from 0x4013a to 0x40146 and 0x401073 to 40107c.  Side by Side, they are:

0040103A lea eax,[esp] 0040103D push eax 0040103E lea edx,[esp+8] 00401042 lea ecx,[esp+0Ch] 00401046 call fctest::FastcallFunction (4010E0h) 00401073 lea eax,[esp] 00401076 push eax 00401077 lea ecx,[esp+8] 0040107B push ecx 0040107C lea ecx,[esp+10h] 00401080 call fctest::ThiscallFunction (4010F0h)

Note that on both functions, the ECX register is loaded with the address of "test".  But in the fastcall function, the 1st parameter is loaded into the EDX register - in the thiscall function, it's pushed onto the stack.

So yes, __fastcall makes a difference for C++ classes.  Not as much as it does for C functions, but it DOES make a difference.