A close look at the Razor Parse Tree

Article
10/11/2010

This is a part of the series where we take a deep dive into the inner workings and extensibility of the Razor parser. In this post we are going to take a detailed look at the parse tree that is generated

To learn more about the workings of parser please see the post by Andrew

In Razor parser the document starts in markup mode(MarkupBlock). Depending on what comes next the parser switches between markup and code mode. This post takes you through the different kinds of markup/code switches that can happen. There is a tool that you can download which generates a parse tree based on Razor syntax input. The tool works for both C# and VB

So lets dive into the building blocks.

At a high level the parsed tree comprises of Blocks which signify what type of Block is the parser parsing. Blocks are the non leaf nodes of the parsed tree. Blocks can contain blocks which eventually terminate in Spans

Following is a diagram which shows the high level structure for the Razor Parse Tree

BLOCKS

Blocks can be of the following types

Block Type	Explanation	Example
Markup	This is the type in which the Parser starts parsing the document. It comprises of Html markup text	Hello world
Section	This is the type where the parser parses the @section construct	@section Foo{}
Template	This is the type where the parser parses the inline templates being defined for the WebGrid helpers	`@grid.GetHtml(columns: grid.Columns(` `grid.Column("Description", format:@<i>@item.Description</< CODE>i>)` `)`
Statement	The parser is parsing a Statement block	@{ int x=1; }
Directive	Parser is parsing the top level directives of the page	@using Microsoft.Web.Helpers
Functions	Parser is parsing the functions block in the file	@functions{}
Expression	Parser is parsing any expressions block	@System.DateTime.Now
Helper	Parser is parsing the definition of @helper construct	@helper HelperName(){}
Comment	Parsing is paring the block commenting support for commenting markup and code	@* comments which have markup and code Markup: Hi Code: @{} *@

Blocks contain the following information

BlockType: One of the above kinds

SourceLocation: Location of the char in the file where the block started which is of the following representation (AbsoulteIndex: LineIndex: CharIndex :: Length of Block)

SPANS

Blocks are divided into the following Spans. Think of Spans as the leaf node in the ParseTree.

The Spans contain information around the position of the span(line, col) and the content being parsed. This is useful in the cases of error reporting, syntax highlighting in the editor

SpanKind	Explanation	Example
Transition	This span signifies that the parser parsed the @ character.	@ in @{}
MetaCode	This signifies the char which start and end a block	{} in @{} for Statement block () in @() for Expression
Comment	All content in the Comment Block	foo in @* foo *@ is CommentSpan
Code	This type has all the code under the Statement/Expression block	System.DateTime.Now in @System.DateTime.Now
Markup	This type has the all the markup content in a markup block	<p></p> in “@{} <p></p>”

I hope by now you would have a high level idea about the structure of the parse tree. At this point I have a sample input for razor file and I will walk you through the generated parse tree.

Spans contain the following information

SpanType: One of the above kinds

SourceLocation: Location of the char in the file where the block started which is of the following representation (AbsoulteIndex: LineIndex: CharIndex :: Length of Span)

Content: Content which is parsed as a Span

Sample Input for a Razor C# file

1 + 1 = @(1+1)

GeneratedParseTree

Markup Block at (0:0,0)::16

Markup Span [V;Any] at (0:0,0)::8 - [1 + 1 = ] (Document)
Expression Block at (8:0,8)::6
- Transition Span [V;None] at (8:0,8)::1 - [@]
- MetaCode Span [V;None] at (9:0,9)::1 - [(]
- Code Span [V;Any] at (10:0,10)::3 - [1+1] - [Terminator: <>]
- MetaCode Span [V;None] at (13:0,13)::1 - [)]
Markup Span [V;Any] at (14:0,14)::2 - [\r\n] (Document)

As you know Razor parser starts parsing with MarkupBlock, so in this case the first block is the MarkupBlock. After creating the markup block the parser sees that the next char in markup so it creates a MarkupSpan and puts all the markup content in this span

When the parser sees @, it knows that the next characters have to do with code so it creates an ExpressionBlock. After creating the ExpressionBlock the parser parses the @ as a TransitionSpan which means that we have transitioned from Markup-Code. ExpressionBlock have the following signature @() so the parser parses the ( as MetaCode span. At this point the parser parsers the remaining characters as CodeSpan until it sees the terminator char ) which is parsed as MetaCodeSpan

After the CodeSpan, the ExpressionBlock does not anything else to be parsed and this the parser consumes the newline character as part of the MarkupBlock

ParseTreeViewer

If you found the above description about the ParseTree that gets generated for the Razor syntax, interesting then you should download this tool which lets generates the Parsed Tree for a given Razor syntax.

This tool can be used for debugging your application, though I would say it is an advanced use. If you think that the parser is not parsing the input as expected, then you can use this tool to see the parse tree that gets generated and figure out what is wrong.

Screenshot of the tool

Download

FAQ

1. You need to have Asp.Net WebPages installed on the machine

2. If you select the “View In Browser” option then the tool generates a temp file “test.htm”

Hopefully this would help you understand the structure of the generated parse tree for razor syntax.