Why aren't there parsers for CodeDom? [David Gutierrez]

Article
03/16/2005

This is an interesting question, and the answer is a bit more than just "historical reasons."

It's hard.
Yes, it's just plain harder to parse code than it is to emit it. Parsing code is a good portion of what compilers do, and since most compilers are written in native code, this probably means duplicating the parser in managed code. Given how many compilers there are out there, it all adds up to a major amount of work to get parsers written for all of them.

There's no big customer
As Kit pointed out in an earlier blog, we do the features that our customers need, and so far there hasn't been any major customer demanding a code parser. Since there's no pressing need, we'd rather spend time on those things for which there is a greater demand. Another way to think about this is that most people get along fine without a parser, which makes sense since most of CodeDom is focused around emitting code.

CodeDom doesn't fully represent all languages
CodeDom trees only represent a subset of what languages support, meaning there are language features you simply can't represent with CodeDom. So what happens when you run into something that isn't supported? In theory even the subset that CodeDom supports should be rich enough to write all possible programs, and you should always be able to convert to something that works. So this would come down to being clever about converting what's in code to what CodeDom can represent.

It's not quite that simple, though. There are things that CodeDom just doesn't support in any way. One example is providing an explicit implementation for event add-remove methods. There's just no CodeDom tree (short of using CodeSnippet*, which isn't language neutral), which can do that. Take a look at Vinaya's excellent post on some common things that CodeDom can't do for more examples. The way to fix this is to change CodeDom so that it can represent anything that any language can represent. Again, not something that is easy to do, particularly given how difficult it is to add things to CodeDom in a way that doesn't break either people building trees or the providers which are emitting code.

On the other hand, there are parsers out there for some languages. These parsers typically don't parse into a CodeDom tree, so the difficulty of representing the parsed code in CodeDom is still major issue. It's possible that in a future release we would try to create some kind of parser for CodeDom, though we have no specific plans. Here's a VB parser available for download.

So why is there an ICodeParser and a CreateParser method?
The original goal for CreateParser was so that custom code providers could hook up their own parsers if they had one. There is actually one implementation of an ICodeParser, though it doesn't strictly parse code. It exists in Visual Studio as a link in the chain between "CodeModel", which is how VS models code, and a CodeDom tree, which is how VS generates code. Why is VS written like this? That's for historical reasons.

Why aren't there parsers for CodeDom? [David Gutierrez]

Additional resources