A Brief History of Phoenix

Article
11/16/2005

Earlier I told you that Phoenix exists, but without much additional information. To give you a deeper understanding in the what and why of Phoenix, let's start off with two related questions: What can Phoenix do for you? And what can Phoenix do for Microsoft? We'll start with the last question.

Phoenix at Microsoft
To really understand this question, it's worth talking a bit about the history of compilers and tools at Microsoft (admittedly, it's a shallow and brief version of the history, but it tells the story that I need). Several years ago the Visual C++ was cranking out compiler backends, much as we do today, but there was a growing concern about the cost of retargeting the backend to generate code for new platforms (such as Itanium and the CLR, POWER, ARM, and later x64). Doing this retargeting was a painful process, and a small group in the VC team decided that it would be worth creating a new backend infrastructure that would allow rapid retargeting to different platforms.

On the other side of the galaxy in Microsoft Research (MSR), they were doing some of the worlds most advanced binary rewriting and static analysis research. This took place in different groups with largely overlapping functionality, yet no sharing of code. When MSR got a whiff of this new project in Visual C++ land they were intrigued. Eventually, Visual C++, MSR, and the CLR teams decided they should join forces to create a new project which not only would be a rapid retargeting backend, but a new platform for doing program transformation and analysis.

Additionally, the teams agreed Phoenix should be the code generation platform for all of Microsoft. At one point there was something like 25 distinct code generators at Microsoft. For example, the current x86 JIT/NGEN uses a different codebase from the 64bit JIT/NGEN, which is different than the Visual C++ compiler codebase (though we do share a codebase for the x64, x86, and Itanium Visual C++ compiler). Microsoft should find a way to leverage the investment in one product into that of another as the core needs and requirements are very similar. Phoenix will be the foundation for all of this.

Phoenix Outside of Microsoft
Now why should YOU care about this Phoenix project? To really understand all of the intricacies of the first question, let's start with the state of the world today.

Today, compilers are a black box for developers. You put in code, you throw some compiler switches (/O2 /W4, etc…), and at the end you get a program or library. As the developer you don’t have much control over what the compiler does. Furthermore, once you get the final EXE, you have little power to do anything with it, besides run it. Sure we provide a few utilities such as ILDASM and dumpbin, which allow you to glean information about the program, but their use is extremely limited.

Why is your EXE or DLL a so enigmatic? You're the one who built it, yet to analyze your EXE's behavior or to make modifications to it is really a craft left to very few. If you have ever tried to do binary rewriting, you know that it is not easy. It's made easier with CLR assemblies, but still not nearly as easy as it should be.

Phoenix's job is to make all of this easier. In fact we view Phoenix as a platform for program analysis and transformation, and most importantly Phoenix is a transparent box. Phoenix provides many opportunities for code transformation and analysis.

Here's the typical flow of source code to program:

Source code -> cl.exe -> c1xx.dll -> c2.dll -> link.exe -> program.exe

The only place where the developer traditionally has much say is at the source code level. But with Phoenix, there will be hooks directly into the code generation process, and you can alter your executable even after it was built.

Examples of Phoenix Use
I'm going to give three examples of things you will be able to do with Phoenix that are rather tough to do today:

1) You came up with a new algorithm that could automatically parallelize C++ programs. How should you implement this? Today you'd have to either (A) write your own compiler from scratch, which will almost certainly make it a toy compiler (unless you have a small army of developers), or (B) tap into some existing hard-to-use open-source compiler infrastructure and modify their code. And if you've ever tried to do this in the past with the popular existing compiler infrastructures, you'll quickly learn that doing option (A) is almost easier.

Phoenix has a plug-in model in the back-end, which allows the developer to insert or reorder compiler phases. Now, you can simply implement a "parallelization" phase (or it may make sense to break it up into multiple phases), and simply hook that phase in-between existing phases. That's pretty darn cool if you ask me. In future postings I'll go into more detail on how to do plug-ins to Phoenix, how to add a new phase, and how to write Phoenix code to do useful analysis and code transformations.

What makes this all the more impressive is that Phoenix will be THE Microsoft compiler. No more testing your algorithm on toy programs with toy languages. Instead, you can now compile real applications with real workloads to understand the impact of your parallelization plug-in.

2) You're now working on taking your parallelization framework directly into Visual Studio. You want to add a new feature in the editor that will place red-squigglies under code to represent code that causes a loop to not be parallelizable (or maybe put in arrows between dependencies in the code). You can use Phoenix to do the analysis on the code - using your dependence analysis package - and then use this information to determine where to put the squigglies in Visual Studio.

And there's even more that Phoenix does. For example, Phoenix has support to do binary analysis and rewriting.

3) You are a consultant that is at a customer site and dealing with debugging an application. You have some libraries that you've written that can do call-stack analysis, but it needs to be invoked by the application at entry to each function.

Today, you'd probably need to rewrite the customer's application. With Phoenix you can write a simple application that loads the EXE and raises the machine code into the Phoenix Intermediate Representation. From there you can add a new function call at entry to each function. After this, you instruct Phoenix to lower the modified Intermediate Representation back into an executable file. Turns out this capability (and other new features) exists in a research project based on Phoenix, called Phx.Morph. See https://research.microsoft.com/workshops/aop/ for more information. Which brings me to Phoenix in academia...

Phoenix in Academia
One of the problems in doing compiler research in academia is the lack of a great platform that is both easy to use (so you don't spend your whole semester learning the infrastructure), and at the same time real (you shouldn't be limited to compiling toy languages, or extremely small subsets of real languages). With Phoenix you get both in a freely available download (more on this in a future blog entry). I would have been ecstatic to have something like Phoenix when I was in grad school.

So as you can see, there is some pretty cool stuff that this platform will enable. I can foresee an ecosystem around building post-link tools and plug-ins, as well as a vibrant compiler research community that will benefit from Phoenix.

What would YOU like to do with Phoenix? We'd love to hear from you...

A Brief History of Phoenix

Additional resources