Why aren’t there parsers for CodeDom? [David Gutierrez]

This is an interesting question, and the answer is a bit more than just “historical reasons.”

It’s hard.
Yes, it’s just plain harder to parse code than it is to emit it.  Parsing code is a good portion of what compilers do, and since most compilers are written in native code, this probably means duplicating the parser in managed code.  Given how many compilers there are out there, it all adds up to a major amount of work to get parsers written for all of them.

There’s no big customer
As Kit pointed out in an earlier blog, we do the features that our customers need, and so far there hasn’t been any major customer demanding a code parser.  Since there’s no pressing need, we’d rather spend time on those things for which there is a greater demand.  Another way to think about this is that most people get along fine without a parser, which makes sense since most of CodeDom is focused around emitting code.

CodeDom doesn’t fully represent all languages
CodeDom trees only represent a subset of what languages support, meaning there are language features you simply can’t represent with CodeDom. So what happens when you run into something that isn’t supported?  In theory even the subset that CodeDom supports should be rich enough to write all possible programs, and you should always be able to convert to something that works.  So this would come down to being clever about converting what’s in code to what CodeDom can represent.

It’s not quite that simple, though.  There are things that CodeDom just doesn’t support in any way.  One example is providing an explicit implementation for event add-remove methods.  There’s just no CodeDom tree (short of using CodeSnippet*, which isn’t language neutral), which can do that.  Take a look at Vinaya’s excellent post on some common things that CodeDom can’t do for more examples.  The way to fix this is to change CodeDom so that it can represent anything that any language can represent.  Again, not something that is easy to do, particularly given how difficult it is to add things to CodeDom in a way that doesn’t break either people building trees or the providers which are emitting code.

On the other hand, there are parsers out there for some languages.  These parsers typically don’t parse into a CodeDom tree, so the difficulty of representing the parsed code in CodeDom is still major issue.  It’s possible that in a future release we would try to create some kind of parser for CodeDom, though we have no specific plans.  Here’s a VB parser available for download.

So why is there an ICodeParser and a CreateParser method?
The original goal for CreateParser was so that custom code providers could hook up their own parsers if they had one.  There is actually one implementation of an ICodeParser, though it doesn’t strictly parse code.  It exists in Visual Studio as a link in the chain between “CodeModel”, which is how VS models code, and a CodeDom tree, which is how VS generates code.  Why is VS written like this?  That’s for historical reasons. 


Comments (8)

  1. Anonymous says:

    I’ve never used this, but it claims to have a C# grammar:


  2. Anonymous says:

    See also SharpDevelop’s sources.

  3. Anonymous says:

    The idea I was thinking of when I asked the question on the previous post was that if there was a standard path from code -> CodeDom tree, then modeling tools/ refactoring tools would have a much more stable path into VS.Net integration. The customer base would be 3rd party tool vendors. Still pretty small, I’ll admit. Or perhaps I’m being naive about the whole concept.

  4. Anonymous says:

    me – I’m not sure what you’re asking. There’s no standard way to represent code in CodeDom that we have. It seems like you’re saying the 3rd party tools would parse code into CodeDom, do some transformations, and reemit it again. Given the problems of parsing and the restrictiveness of the codedom output, I don’t think this would be an acceptable solution for most code tools.

  5. Anonymous says:

    CodeDOM のパーサーがなぜ提供されないのか