Using the Roslyn Symbol API

Article
11/23/2011

by Kevin Pilch-Bisson

I’m back again, to move along to the next stage of the compiler pipeline, and take a look at working with Symbols in the using the Roslyn CTP.

The Roslyn CTP’s symbol API provides a top-down view of all the symbols available. Before we get to symbols though, we need to start looking at how to give the compiler enough context to tell us about symbols. While syntax trees are mostly context free and can stand on their own, in order to learn about symbols we need to have a collection of syntax trees, the references passed to the compiler, and any options in effect. In Roslyn, we use the Compilation type to group these things together. Conceptually, a Compilation represents one invocation of the csc.exe command line, or a single project in Visual Studio.

Let’s start by creating a compilation that includes something like the simple file we used last time when talking about syntax trees:

 using System;using System.IO;using System.Linq;using Roslyn.Compilers;using Roslyn.Compilers.CSharp;class Program{    static void Main(string[] args)    {        var syntaxTree = SyntaxTree.ParseCompilationUnit(@"class C{    static void Main()    {        if (true)            Console.WriteLine(""Hello, World!"");    }}");        var compilation = Compilation.Create("test.dll",            syntaxTrees: new[] { syntaxTree },            references: new []                {                    new AssemblyFileReference(typeof(object).Assembly.Location),                    new AssemblyFileReference(typeof(Enumerable).Assembly.Location),                });    }}

Aside: You’ll find that many parts of the Roslyn APIs use optional arguments, and it’s convenient to call them with named parameters. The reason for this is that we’re exposing an immutable API, so we want to allow you to specify all the options right at creation time, but we don’t want to force you to type them all. In a mutable model, we’d probably leave many of these things out of the constructors/factory methods and allow you to set them via properties.

Now that we’ve got a Compilation, what are the interesting things to do with it? Well, there are a few.

“Change” it

Like SyntaxTrees and many other parts of the Roslyn API, Compilations are immutable, so we can’t really change them. However, there are methods like AddReferences() and RemoveSyntaxTrees(), that will return a new Compilation object with those changes (just like string.Append)

Find errors and warnings

The Compilation object makes it possible for us to programmatically determine what errors and warnings exist in a piece of code. There are two different APIs for this on the Compilation object. First of all you can call GetDiagnostics, which returns all of the errors for the Compilation. This requires doing a significant amount of work (almost as much as just compiling), so it can take a while.

In addition to GetDiagnostics, the Compilation has a method named “GetDeclarationDiagnostics”. This is meant to be a somewhat less expensive call, and one that doesn’t throw away the results of its work. You can think of the declaration diagnostics as all the errors that happen outside the curly braces of method bodies and property accessors. Finding them requires completely filling out the symbol table of the Compilation, but it doesn’t require looking at method bodies. In contrast, GetDiagnostics first calculates the declaration diagnostics, and then binds each method body in turn. Keeping the fully bound form of every method would take a lot of memory for a large Compilation, and so the bound information is released once the diagnostics have been collected.

 var diagnostics = compilation.GetDiagnostics();foreach (var d in diagnostics){    var lineSpan = d.Location.GetLineSpan(usePreprocessorDirectives: true);    var startLine = lineSpan.StartLinePosition.Line;    Console.WriteLine("Line {0}: {1}", startLine, d.Info.GetMessage());}

Compile it!

We can write out the bytes to an assembly using Emit(). In case we want to Emit if possible, and display diagnostics if not, the Emit call returns a result that includes the diagnostics. This means that we can skip a call to GetDiagnostics before trying Emit, which means that nothing has to look at each method body twice.

 using (var file = new FileStream("test.dll", FileMode.Create)){    var result = compilation.Emit(file);}

Examine the symbol table

One of the most interesting things we can do is look at the symbol table and examine the types and members inside of it, sort of the way you would do with Reflection.

To examine every type in the Compilation, you can start with the GlobalNamespace property of the Compilation. One thing to note is that the Compilation’s GlobalNamespace is what the compiler uses when doing lookups, so it includes everything defined in the Compilation and in its references. If you want to see only things that are actually defined in the Compilation, you can look at the “Assembly.GlobalNamespace” instead. Let’s look at an example where we find every member that contains the letter “a” in its name:

 ReportMethods(compilation.Assembly.GlobalNamespace);

and

 private static void ReportMethods(NamespaceSymbol namespaceSymbol){    foreach (var type in namespaceSymbol.GetTypeMembers())    {        ReportMethods(type);    }    foreach (var childNs in namespaceSymbol.GetNamespaceMembers())    {        ReportMethods(childNs);    }}private static void ReportMethods(NamedTypeSymbol type){    foreach (var member in type.GetMembers())    {        if (member.CanBeReferencedByName &&            member.Name.Contains("a"))        {            Console.WriteLine("Found {0}", member.ToDisplayString());        }    }    foreach (var nested in type.GetTypeMembers())    {        ReportMethods(nested);    }}

Note the handy “CanBeReferencedByName” property, which saves us from including things like accessor methods for a property, that exist in the symbol table, but which users aren’t allowed to actually name. Also, take note of the “ToDisplayString()” extension method, which allows you to generate various forms of symbol names, including some human readable ones.

Alternatively, if there is a specific type you want to find, you can use the GetTypeByMetadataName method. We needed a way to represent fully qualified names type names that include generic arity and nestedness. Rather than define our own format, we decided to use the metadata format, which means that to use this API, you’ll need to use + to separate nested types, and encode the generic arity with `n where n is the number of type arguments. For example, to look up the nested Enumerator type in List<T>, we’d use a string like “System.Collections.Generic.List`1+Enumerator”.

One question we get is that if this model is similar to reflection, why isn’t it easier to get to the System.Type for a TypeSymbol? The reason is that the compiler may be targeting a different runtime than the one that your code is running on – consider a compilation for a Silverlight project for example. There is no way to load a System.Type for some type in the reference assembly to mscorlib.dll. At best you would find the corresponding System.Type in the desktop CLR, which may or may not be what you need.

Analyze specific parts of it

It may be interesting to pick a particular expression and see what method is going to be invoked, or similar semantic questions. We’ll look at this in more detail in the next post, but if you want to get a head start, take a look at the GetSemanticModel method on Compilation, and it’s resulting SemanticModel object.