Continuing on the Device Emulator, make sure to check "DeviceEmulator V2 - how did we get a 40% improvement in performance?", a great post from Barry Bond, Device Emulator's architect:
The DeviceEmulator V2 is significantly faster than the V1 emulator you're used to. Most of the performance wins come from a small set of optimizations in the ARM-to-x86 JIT and the MMU emulator. These wins improve raw execution of ARM instructions, so all applications and OSes benefit...
Barry goes on describing six "simple optimizations" that provided a "substantial performance win":
- Faster Translation Lookaside Buffer (TLB) implementation.
- Reduce x86 processor stalls due to mixed code and data
- More efficient interrupt polling
- Optimized memcpy() and memset()
- Optimizing "/Od" Code-Gen from the ARM C/C++ Compiler
- Faster Disassembly of ARM Instructions