What is the difference between AMD64 & EM64T, really?

Yes, I've had inside information about all of this for quite some time, but none of this info is confidential.  It's all discoverable - for all I know, there are already websites out there that provide this detail.  The trouble is that both companies seem to be doing their best to protect their own (completely understandable).  Because of this, I've found that it's tough to get a clear answer on this question.  Here it is, to the best of my understanding:

 

AMD64 & EM64T are compatible with each other in much the same way that the AthlonXP and Pentium III/4 are compatible:  There are a different set of extensions to the basic x86 instruction set that each chip supports.  As long as you stay away from these instructions, your code will execute just fine on both architectures.  The current generation of AMD64 CPU's (Athlon64, FX, and Opterons) implement 3DNow! Professional, which is basically MMX (ick), SSE, SSE2, and 3DNow!.  EM64T CPU's implement SSE3, SSE2, SSE, and MMX (ick).

 

Regarding the (ick) comments about MMX:  Don't use MMX on x64 [I don't think the ML64 assembler even supports it!].  I haven't yet seen a single scenario where MMX code runs faster than SSE2 code.  The primary reason I've heard that people want to use MMX is because they already have a bunch of x86 ASM code using MMX.  I have 1 word for you:  rewrite.  Chances are, if you don't rewrite it, your asm code will be completely wrong & broken, due to ABI restrictions - you won't discover this until you find yourself staring at a broken stack dump, or a terminated process, due to failed exception handling.  If you've already vectorized your code, it's probably not too difficult to change it from working on 8 byte chunks to 16 byte chunks.  You'll get twice the number of registers to use, and maybe even a minor perf improvement because of code density.  You won't yet see many more perf improvements over those, because none of the current generation of x64 chips have a 16 byte wide SSE unit:  they just break the operations into 8 byte pieces inside the CPU itself.  But you save space of an extra instruction so it's goodness, overall.

 

On a slightly different note, Sun's relatively new workstation, the 2100z, is amazingly fast and very quiet, too.  My only real complaint is that I needed to add ancient PCI Matrox G450 to get more than 2 monitors connected...  It's kind of fun running Win64 on a machine with a big Sun logo on it's side, and a Java logo on the front.  Sun does make some nice hardware, no matter what your opinion is of their software.

 

And finally, the disclaimer:  Please note that all of this information is coming out of my brain, not a Microsoft lawyer.  I can't guarantee it's accuracy.  My brain's been flaky, lately.  If I offended anyone (or any company), it wasn't intentional.  I'm only expressing my own personal opinion...