eMbedded Visual C++ ARM Compiler

Larry Bank posted this question in the web chat we had last week :

>>>>>

The current ARM compiler is pretty bad at producing fast code. Is the ARM compiler improving in eVC4 SP3?
The current ARM compiler has 2 major flaws and 1 minor flaw:

1) The REGISTER keyword is ignored. This is critical to producing fast code because the compiler does not always choose the right variables to keep in register.
2) The favor Fast vs. Small code options don't seem to work. I set it for fastest code and it adds extra branches everywhere to make the code smaller (and slower)
3) There is no way to specify CPU - specific optimizations (e.g. OMAP, StrongARM, Thumb, XScale).

>>>>>

I asked the compiler team and here's what they had to say.

1. The register keyword is only a *hint* to the compiler to help it generate better code, which was probably more useful in the days where compilers didn't aggressively optimize. But with more advanced optimizing compilers such as ours that assign registers to the variables based on information such as lifetime and number of uses, the compiler is expected to be able to do a better job of selecting registers on its own. Supporting the effects of the REGISTER keyword has been deprecated. The same language constraints still apply as mentioned in the link below. If it were the case that the compiler had to honor the keyword as much as it could, the compiler could likely be impeded from generating more optimal code.

More info:

https://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang98/html/_clang_the_register_storage.2d.class_specifier.asp

2. Many improvements have been made in the ARM compiler optimizations. However, getting a code sample for this case may help us spot problems we haven't caught yet.

3. The design is to generate the best code for ARM vs. Thumb vs. XScale based on the three respective -QRxxx compiler flags. There are no known optimization opportunities for OMAP. In general, our compiler tries to maximize compatibility across the ARM family, and it usually turns out that any non-compatible optimization that favors a particular processor doesn't give a win in real-world average cases anyway. Finally, using the ARM assembler may be a useful tool for rewriting perf-critical code using specific hardware features.

[Author : James Pratt]