Only a few hours left (part 3)

Article
09/13/2005

The previous post on this topic gave us a problem statement for us to look at. Specifically, how to design an internal structure that we want to be easily consumable from many different locations, while not weighing down the structure with any orthogonal unnecessary functionality. At the end of the previous post we also discussed how the ReplaceTypeCodeWithClass had several benefits and drawbacks. Let's look at how that technique would address the code example we've been using so far.

When we replace the type switch with separate classes, we'll end up with a hierarchy that looks like this (simplified):

We now have a flexible type system that encodes the idempotent information in the type signature now instead of in the dicriminant ID field. This carries a lot of benefits (least of which is that there is now compile time type checking on these types), however it also (initially) carries some drawbacks. What would our parser code look like now?

     public class Token {
        public void DetermineWhichTypeToParse(Parser parser) {
            //error
        }
    }

    public class InterfaceToken : KeywordToken {
        public void DetermineWhichTypeToParse(Parser parser) {
            parser.parseInterface();
        }
    }

    public class ClassToken : KeywordToken {
        public void DetermineWhichTypeToParse(Parser parser) {
            parser.parseClass();
        }
    }

    public class Parser {
        Token CurrentToken { get { ... } }

        void parseType() {
            parseModifiers();

            CurrentToken.DetermineWhichTypeToParse(this);

            //Parse rest of type
        }
    }

Sure it works. But *bleagh*. Now our nice token hierarchy is cluttered with parser knowledge that is should know nothing about. On top of that, it's quite possible that to be able to pull this off i'd have to expose private parser specific functionality to make this work (i.e. make my private parser functions internal). This is really not the path that i want to go down. I want to have this nice rich type system, and i want to be able to use it a flexible manner, but i don't want to end up with ugly code like the above.

Is there a solution? Luckily, yes, there are many. One of which is multi-methods (which are already available in .Net, albeit not in a clear form), the other of which is to implement the well known visitor pattern on this new token hierarchy. So what would that look like? Well, we'd start with the following code:

     public interface ITokenVisitor {
        void VisitToken(Token token);
        void VisitKeywordToken(KeywordToken token);
        void VisitInterfaceToken(InterfaceToken token);
        void VisitClassToken(ClassToken token);
        void VisitIdentifierToken(IdentifierToken token);
        void VisitContextualKeywordToken(ContextualKeywordToken token);
        void VisitAccessibilityToken(AccessibilityToken token);
        void VisitNoisyToken(NoisyToken token);
        void VisitCommentToken(CommentToken token);
        void VisitWhitespaceToken(WhitespaceToken token);
    }

    public class (DefaultTokenVisitor implements ITokenVisitor {
        public void Default(Token token) {
        }

        public void VisitToken(Token token) {
            Default(token);
        }

        public void VisitKeywordToken(KeywordToken token) {
            Default(token);
        }

        /* all further implementation just defers to "Default" */
    }

    public class Token {
        public void AcceptVisitor(ITokenVisitor visitor) {
            visitor.VisitToken(this);
        }
    }

    public class KeywordToken extends Token { ... }
    public class InterfaceToken extends KeywordToken { ... }
    public class ClassToken extends KeywordToken { ... }
    public class IdentifierToken extends Token { ... }
    public class ContextualKeywordToken extends IdentifierToken { ... }
    public class AccessibilityToken extends KeywordToken { ... }
    public class NoisyToken extends Token { ... }
    public class CommentToken extends NoisyToken { ... }
    public class WhitespaceToken extends NoisyToken { ... }

Pretty standard Visitors right? Yup, nothing special about them. Except... these visitors are written in Java. Why Java? Well, as it turns out, Java has a very nice language construct that makes using visitors quite handy. Let's take a look at how our parser code would look in Java:

     public class Parser {
        Token getCurrentToken() { ... }

        void parseType() {
            parseModifiers();

            getCurrentToken().AcceptVisitor(new DefaultTokenVisitor() {
                public void VisitClassToken(ClassToken token) {
                    Parser.this.parseClass();
                }

                public void VisitInterfaceToken(InterfaceToken token) {
                    Parser.this.parseInterface();
                }

                /* Further cases */

                public void Default(Token token) {
                    //handle error
                }
            });
        }
    }

Here i've used Java's "Anonymous Inner Classes" to trivially create a visitor that allows me drive my parser on top of these new tokens. Specifically, the visitor says that when it "visits" a "class" token that the outer parser ("Parser.this") should start parsing a class, likewise with an interface token. Any other token (beyond enum/delegate) will cause an error (recall that all Visit methods in DefaultTokenVisitor defer to the Default method which we've overridden in our anonymous inner class). As you can see, this structure is completely isomorphic to the "switch" statement we saw in the original post. Here it is again for reference:

     public class Parser {
       ...
            switch (CurrentToken.ID) {
                case TokenID.Class:
                    parseClass();
                    break;
                case TokenID.Interface:
                    parseInterface();
                    break;
                /* Further cases */

                default:
                    //Handle errors
                    break;
            }
        }

switch (CurrentToken.ID)" corresponds to the "getCurrentToken().AcceptVisitor".
the "case" statements correspond to the overridden "Visit" methods in the anonymous inner class
the "default" case corresponds to the overriden "Default" method in the anonymous innder class

Seems great! We now have a convenient hierarchy for describing tokens in a type safe manner, and we have a Visitor system that allows us to use them flexibly without clutter, while also allowing the code around the token handling to be self-describing. i.e. i can easily look at the anonymous visitor and see what it's doing.

So we're done right? This is the path we should go down? Well... not yet... there's still one unanswered question: Why was i using Java to demonstrate this style of development?

Only a few hours left (part 3)

Additional resources