Five-Dollar Words For Programmers, Part Three: Homoiconic


Jeff Atwood was kind enough to once more give me the shout-out in his blog the other day. Thanks Jeff!

This inspires me to continue my series on five-dollar words for programmers. Here’s one that I only learned relatively recently, when I helped write the code that translates a lambda expression into an expression tree which represents the content of the lambda: homoiconic.

Expression Trees A language is said to be homoiconic if the representation of the program can be seen as a data structure expressible in that language. With expression lambdas being convertible to expression trees (which can then be compiled into code at runtime), C# 3.0 is somewhat homoiconic. But it pales in comparison to, say, LISP, where pretty much everything you can do in the language you can also represent as structurally isomophic data.

Something I personally would like to see more of in C# in the future is greater homoiconicity. We could extend expression trees to statement trees, declaration trees, program trees, and so on. This series of steps would enable increasingly powerful and interesting metaprogramming scenarios.

C# suffers from the lack of a metalanguage. We absolutely do not want to go towards the horrible and primitive metaprogramming language exemplified by the C preprocessor language. We already have a great language, C#, so why not use C# as a metalanguage? Wouldn’t it be nice to make C# its own metalanguage? And once we do that, then we get into some truly strange loops, where we could use those data structures in the compiler implementation itself!

This is a long way off and might never happen, but a guy can dream.

Comments (24)

  1. configurator says:

    Exactly.

    What made Scheme so powerful (and probably lisp too, but I’ve never delved in lisp) was the fact that you could simply add a ‘ before any command, statement, or complete program, and it would become an expression tree, which you could later execute using (exec expression).

    So instead of doing

    (define (double x)

      (* x 2))

    you could

    (define (double x)

       (exec `(* ,x 2)))

    and it would insert the value of x in the code – and if the value of x was the symbol y, this would become (* y 2), doubling the global y and not x.

    When you need context, you’d have

    (define-macro (add-vars x y)

       #`(+ ,x ,y))

    or something like that – or the more complex yet still easy to control define-syntax.

    I don’t remember it all, since it was a *long* time ago, but it enabled beautiful things. My favourite example for the power of scheme was the amb macro, which was defined as a 20-line or so macro. It would accept any number of possible values, and try them all until he finds one that doesn’t fail, enabling logical programming – for example, what’s the first prime between 20 and 30?

    (define (first-prime-between-20-and-30)

      (let ([x (amb 20 21 22 23 24 25 26 27 28 29 30)])

         (if (is-prime? x)

            x

            (amb-fail))))

    This is beautiful usage of the language. If you enable macros and metaprogramming in C#5, it would be an *extremely* powerful language.

    But only if you manage to keep it simple yet powerful.

  2. The lack of a metalanguage in C# is a major failing, please read my blog rant from almost a year ago.

    http://johnmelville.spaces.live.com/blog/cns!79D76793F7B6D5AD!163.entry

    Interestingly, Visual Studio provides T4, which implements meta-programming as an IDE feature.  C# 3.5 is one of the many "dialects" of meta-languages supported by T4.  Unfortunately the IDE experience leaves something to desire with regard to intellisense and etc.  It also requires a partial class for the metaprogrammed and non-metaprogrammed parts of a given class.  On the whole, though it gives a very sweet taste of what a good meta-programming system could add to the (already excellent) C# language.

  3. Denis says:

    .NET already has it CodeDOM, which does almost the same: I say ‘almost’ because it does not create a direct representation of a program, but a representation of the program’s source code, in the language of your choosing. Later, this source code can be compiled in memory and loaded on the fly; I believe WF does a lot of the same (although I do not pretend to know for sure: it’s just that back in 2006 I have written tons of very similar code for a company whose engineers had created XML representations of their workflows used to automate their hardware testing and brought me aboard to create a sort of ‘visual studio’ to make those workflows run, the WF being in a very early beta back then).

    My intuition tells me that the expression trees are more ‘immediate’, although, once again, I do not pretend to understand in what way, exactly: you still need to call the Compile method of your LambdaExpression object, right?

    But, if C# does eventually become its own ‘meta-language’, the Business Process Mangement dream may come true: just define the English (or any other ‘human’ language) keywords, like, ‘Incident’, ‘Change Request’, ‘Asset’, etc., as data types, then make the verbs like ‘deploy’ or ‘break-fix’ invoke the related workflows (for example) and — voila! — you have RUNNABLE DOCUMENTS, in English, or in your business’ native language, which the Big Bosses write to define the ‘standard operating procedures’ for their business and you then run for them! That should dwarf both WF and Azure, both by ‘being cool’ and by selling around the world… 🙂

  4. Karellen says:

    +1 Douglas Hofstadter reference.

  5. > The lack of a metalanguage in C# is a major failing

    It’s not, it’s merely an indicator that it will take some more time for MP to become mainstream. Be patient. We’ve already got lambdas and type inference, and look at the poor Java guys who have neither 🙂

    > T4, which implements meta-programming

    T4 effectively is a macro system, which is a very weak form of metaprogramming. In truth, for anything serious, you want a proper language to work with (and it’s best if it’s the same as the target language, and not like template metaprogramming in C++).

    > .NET already has it CodeDOM, which does almost the same: I say ‘almost’ because it does not create a direct representation of a program, but a representation of the program’s source code, in the language of your choosing. Later, this source code can be compiled in memory and loaded on the fly

    Ah, but that is not the point of the exercise. Generating C# code on the fly is quite doable even without CodeDOM. The trick is rather to parse the code, and manipulate it.

    > But, if C# does eventually become its own ‘meta-language’, the Business Process Mangement dream may come true: just define the English (or any other ‘human’ language) keywords, like, ‘Incident’, ‘Change Request’, ‘Asset’, etc., as data types, then make the verbs like ‘deploy’ or ‘break-fix’ invoke the related workflows (for example) and — voila! — you have RUNNABLE DOCUMENTS, in English, or in your business’ native language, which the Big Bosses write to define the ‘standard operating procedures’ for their business and you then run for them! That should dwarf both WF and Azure, both by ‘being cool’ and by selling around the world

    I believe that’s how the Oslo guys are trying to sell their product now. Me, I’m not convinced. So far such experiments haven’t been successful. By the way, do you know that SQL was supposed to be a language that non-programmers could use "naturally" to get immediate results without deep exposure and learning?

  6. James Dunne says:

    Have you seen Nemerle?  The ability to extend the language’s syntax with "macros" written in the Nemerle language itself is seriuosly powerful.  I’m tired of external code-generation tools, including T4 and CodeSmith.  Ideally, C# should follow in Nemerle’s footsteps and allow one to define coding patterns within the language itself.  Simply adding an attribute to a method (among other ways) to compile-time modify the behavior of that method or to generate code is indeed powerful.

  7. Mark Rendle says:

    I seem to remember Anders saying something about statement trees and meta-programming in the E2E he did on the future of C# on Channel 9 recently. Meta-programming is definitely something I’d like to see, given the amount of time I spend writing code generators of one form or another, and writing generic and/or reflection code that borders on insanity.

  8. Great! This is exactly the word I was grapsing for when I was trying to articulate some things about application design. Automation and scripting are more feasible to support if the built-in application features are using the same APIs. In other words, there is no "internal" and "external" API, there is just "the" API. (And maybe built-in features get access to a superset API, but the API is not disjoint.)

    The problem you run into is with an application that was not designed for automation/scripting for v1, and then suddenly this come in as a high priority feature for v2/v3. You can either provide a facade API and snake your calls through multiple thunking layers, or you can … rewrite half the app to provide the homoiconic design you wish you’d had in the first place (both hindsight and psychic powers are 20/20 after all).

  9. Blake Coverett says:

    Count me in for another enthusiastic vote in favor of real metaprogramming.   This is easier to implement in the Lisp languages because there’s no real syntax, you are pretty much just writing bare ASTs anyways.

    Finding the right balance in a language with a rich syntax like C# is, as far as I know, not a cleanly solved problem.   How best to capture the statement/declaration/program tree in the first place?  Can quoting be avoided by using an assignment context like expression trees currently do?  For example:

    class Foo {}

    class Bar {

     void Main() {

       Class foo = typeof(Foo);

     }

    }

    But if you go down that path, you lose the obvious ways to splice in ‘unquoted’ bits.  Lots of open questions.   I hope it becomes a priority to find good answers.

  10. Joe Chung says:

    You can’t have a homoiconicity post without somebody referring to Greenspun’s 10th Law.

    "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp."

    😉

  11. Eric,

    I wonder whether you or any of the C# designers have ever taken a look at Boo. It’s a statically typed programming language for .NET that enables exactly what you describe in this blog post, and it also enables what Anders Hejlsberg has shown about his vision of C# 5 in a few talks in the past.

    Boo can do this because it is already implemented in C# (and is currently rewritten in Boo), so it can simply provide the AST representation of the program in its internal expression tree syntax. This enables things like Meta methods (method calls executed at compilation time instead of runtime), AST attributes (attributes with a Process method that gets invoked when the attributed code element is compiled), macros, etc. It also has an extensible compiler queue, where you can insert custom compilation steps, transforming the program tree to your heart’s desire. It is well-suited for any kind of metaprogramming, and is heavily used for DSLs.

    Whenever somebody on the C# team talks about these features, it’s like "this is so far into the future" mixed with "in ye olden times, mysterious languages like LISP could do that".  Yet, Boo can do all of this, now. On the .NET platform. Look at it: <http://boo.codehaus.org/&gt;. And maybe learn from it? Or at least acknowledge that approaches like this already exist on the .NET platform.

    Regards,

    Fabian

  12. stefan.wenig@rubicon-it.com says:

    Hi Eric,

    I have two comments:

    1) having a full AST in C# would be great. And I don’t say this because it just sounds cool, but because it would enable better solutions for a number of problems we can currently only solve in very clumsy ways.

    However, I want to bring your attention to the problem of meta-programming in the static .NET type system in general. Even if we get AST-level access to an entire class, and be able to modify that AST (which is currently not possible, and copying ASTs is not always the better solution), we’re still in a much worse position than any Lisp programmer would be. For two reasons: syntax and types. I don’t need to tell you about syntax making meta-programming harder, and it’s a problem that C# would share with many dynamic languages. The other problem might be worse:

    "In ye olden times", the .NET type system was simple. We had static types and a fallback of "object" for everything we could not express using that simple type system. This was in fact not much harder to deal with than purely dynamic languages. However, with the power of generics, we also invited a whole new level of complexity into .NET-based meta-programming. It suddenly became hard. For generic types we get so many specific combinations of generic arguments in types, members, base types and interfaces etc., and in so many differenct states (closed, open and everything in between) that in any interesting challenge in meta-programming, it becomes a nightmare event to figure out which tests to create.

    It is not an accident that we found quite a few deep bugs in the .NET framework when we created a library that brings real mixins to C#. And considering the overwhelming complexity of everything, I almost find it surprising that we didn’t find more. That’s really difficult stuff, and kudos to the C# and CLR teams for getting this right. However, now it suddenly is just as difficult to do meta-programming in Reflection. And I’m pretty sure that these problems are going to tickle down into C# 5 (?) meta-programming, if they are not adressed carefully.

    I’m with Fabian, finding a way to enable meta-programming in a static .NET language is not rocket science, humankind has been there before. You might find an even better way, fine. But if you really want to make a difference here, you should focus on providing simple yet powerful ways to handle the complexities of the type system with special attention to generics.

    (How could that look? Hell, if I knew I could probably apply to Eric Meijer’s job ;-))

    2) This is a pet topic of mine. I believe that IQueryable-based LINQ (i.e., anything but LINQ to Objects) is somewhat broken. Here’s why:

    The processing time required to transform the Queryable-AST to SQL by LINQ to SQL sometimess exceeds the actual time to do the query, including the roundtrip to the server. Other meaningfully complex LINQ providers are going to experience similar problems.

    This is not surprising for people who know the way C# translates LINQ query expressions into chained method calls. Making any sense of the results is not just hard to code right, but also a lot of work for the machine. Even simple query expressions are translated into something ugly, verbose, and structurally hard to handle just so that they result in compilable C# code. This is funny, because most of the time they would never have to be compiled, but rather translated into something that in turn closely resembles the original LINQ query. What a waste of time!

    The result is a static, strongly typed compiled language that takes so many detours that the resulting performance is probably worse than in a dynamic language like Ruby that could provide a more straightforward approach.

    Now not everything is rain clouds, dragons and Bush here, all of what you’re doing makes sense. But you have to deal with the arising problems somehow.

    Currently, the LINQ to SQL team’s answer is explicitly compiled queries. This sucks for a number of reasons, the most obvious one being that it’s hard, ugly and error-prone, and not at all Language Integrated in the way that LINQ queries are supposed to be.

    Is that it? Look at those beautiful LINQ queries you can create (but for production code, we recommend this ugly workaround)?

    I beleive that there has to be _some_ way to implement implicit and automatic caching of generated queries. The only problem with this is that every time a from. … select statement gets executed, a new AST is created. This defies any attempt at efficient caching.

    Here’s my proposed solution: Instead of digging captured variables deep inside the ASTs, automatically generate lambdas and insert the actual variables at instantiation time. The inner lambda expression (tree) can remain unchanged, and can therefor be stored in an invisible static field somewhere.

    pseudo code: instead of

    var x;

    execute AST (source.Where (s => s.Property > x));

    you could generate

    static ast = AST (x => source.Where (s => s.Property > x));

    execute AST (ast (x));

    Now we can cache the ast object using its reference as a dictionary key.

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=315491

    Together with your vision of ASTing entire statements, we could even capture whole LINQ statements that way, cache their transformation results, and just insert the parameters at run time. Pretty much what I need to do today using compiled queries today, but transparently so.

    What do you think?

  13. Sam Aaron says:

    If you want to see an example of a object-oriented (albeit prototype based) homoiconic language, then you might find Ioke interesting. It’s not fit for production use (it’s still a baby) but it currently serves as an excellent didactic vehicle for exploring and understanding these concepts in a fairly recognisable syntax for people with Java, C#, Ruby backgrounds.

    Oh, and quite a bit of the core of the language is written in itself and has a whole load of tests which really makes things a lot more accessible and easy to understand.

    http://ioke.org

  14. Sam Aaron says:

    If you want to see an example of an object-oriented (albeit prototype based) homoiconic language, then you might find Ioke interesting. It’s not fit for production use (it’s still a baby) but it currently serves as an excellent didactic vehicle for exploring and understanding these concepts in a fairly recognisable syntax for people with Java, C#, Ruby backgrounds.

    Oh, and quite a bit of the core of the language is written in itself and has a whole load of tests which really makes things a lot more accessible and easy to understand.

    http://ioke.org

  15. Robert says:

    Why not implement a Common Lisp.Net like clojure? Not a toy like ironscheme, but a real supported implementation. You will never get away from ASTs, so why not just use lisp.

  16. pminaev says:

    > Whenever somebody on the C# team talks about these features, it’s like "this is so far into the future" mixed with "in ye olden times, mysterious languages like LISP could do that".  Yet, Boo can do all of this, now.

    Solving a problem here and now is often not hard. Ensuring that it is solved correctly is harder, and ensuring that it it solved in such a way that it does not create a lot of pain in the future is harder still.

    If you’ve been tracking the design of C#, you know that a lot of seemingly innocuous language features are there for very deep reasons. A good (and relatively well-known) example of that are method overloading rules in presence of inheritance, but I keep stumbling into more and more even years later since I’ve first wrote a line of C# code, and it never ceases to amaze me. The level of design effort that went into the language is truly outstanding, even despite the occasional blunders (*cough* array covariance *cough*).

    This all has its price. You can’t just say, "oh I’ve seen this neat thing in Boo (or Nemerle, or whatever), let’s do the same!". Consider that the designers will have to seat down and carefully analyze the requirements and the suggested approach, determining whether it is even the best way to go about the problem (remember, once you do it and release it, you’ll have to support it!); looking for potential corner cases and pitfalls (Sam Ng’s series on some corner cases of "dynamic" was very illuminating in that respect); finding reasonable solutions to all those; etc. The solution, should any be arrived to, has to be formalized in an unambiguous way for inclusion into the language spec.

    This doesn’t end here, however. You have to test the feature – that means devising tests for maximum coverage (including all those corner cases!) and implementing them. Then there’s IDE – you’d want your code completion, and refactoring, and on-the-fly error highlighting, and debugger support, for all the new stuff, wouldn’t you? Finally, documentation – a series of articles on the core feature itself, API reference, a number of how-tos and examples for all of the above…

    Now consider the reach and magnitude of the metaprogramming feature, and think about how long each stage of the process outlined above would take. I think it would probably be more involved than LINQ… (and I was really quite surprised by how many new language features were squeezed into Orcas release, considering the time frame).

    And, of course, it’s not like metaprogramming is the only feature request for future C# versions, and probably not even the most popular one – just go to Connect and search for C# feature suggestions…

    If we get that in C# 5, it will be wonderful. But I’m not holding my breath for it. It looks like too big of a fish for now.

    Me, I’ll consider myself happy if I get "readonly class" and syntactic sugar for CodeContract 🙂

  17. Andrew says:

    The macros in Scheme (and Lisp) only work because of the beautifully regular syntax, code is data and data is code. It just wouldn’t work in C# or any other Algol derived language. It’s amazing to think that over 50 years ago when Lisp was created, it was a near perfect language and hasn’t been bettered since.

    Here’s a radical suggestion but if you want these capabilities to set your creative skills free then why not simply learn Scheme. There is an explosion of interest in Scheme at the moment and it is rapidly edging towards the mainstream. Even Microsoft has woken up to functional languages and has released F#.

    The more of us that create libraries for Scheme, the more everyone else can use it for real projects instead of the flawed languages we are forced to earn a living with.

  18. pminaev says:

    > It just wouldn’t work in C# or any other Algol derived language.

    As stated above by others, there are counterexamples of that, such as Nemerle (which is pretty close to C# syntactically all in all).

    BTW, Microsoft hasn’t yet _released_ F# 🙂

  19. Matthew Kane says:

    So XSLT is homoiconic and Judy Garland isn’t? Go figure.

  20. Thomas says:

    Hold on there.  What’s so great about homoiconic?  It gives you lots of power — to tie yourself into knots of un-manageable, un-maintainable, un-readable code.  And talk about security risks…  HUGE!  Ask yourself — why isn’t Lisp used in any major commercial setting?  Because making code -> data and data -> code creates huge pitfalls.

    I would argue that there are many more reasons than that. A language is successful in a particular domain when it successfully models the processes and concepts in that domain. Business settings tend to have a lot of concerns about processes, so it should not be surprising that procedural coding styles are important. Business settings tend to be concerned about rapidly changing external state, so it should not be surprising that coding styles involving mutable global state are common. Businesses tend to manage human scenarios by throwing the “hierarchy pattern” at problems; it should not be surprising that hierarchical object-oriented programs are common.

    Lisp can do all that of course, but the fundamental metaphor that Lisp throws at every problem — the analysis of lists by functions — is a far remove from the model world. In C# we deliberately make procedures, mutable properties, and the ability to explicitly model hierarchy all first-class concepts in the language right there for you to use.

    Though there certainly are pitfalls with homoiconicity, you can get yourself into even deeper morasses with C macros — and many people do! The success of C speaks to the desirability of giving programmers “enough rope”. You can’t go denying people a tool just because some will misuse it horribly. But of course we want to make sure that the tools are designed to be easily used well, and designed specifically for real-world scenarios. — Eric

    And any coneptual advantages of of doing so pail in comparison to the problems caused.  Algol and Algol-derived languages have maintained that separation for very good reasons.  Just because something can be done, and because there may some academic symmetries in such a concept, doesn’t mean it should be adopted. And just because programmers love to produce “cute” (if obtuse) code doesn’t mean we should design commercial languages to cater to that anti-pattern.  Careful language design needs to take into consideration much broader concepts of usefulness and useability.  I think the C# team has done a pretty good job of maintaining that perspective.

    Indeed. We’re not interested in homoiconicity and metaprogramming because its cool (though it is.) We’re interested in it because we have many customers whose fundamental use of C# involves in large part the automatic generation of more C#. Our existing code models are not strong enough to do all the tasks our customers would like them to do. — Eric

  21. Ahora que ya hemos visto que es una expresión lambda podemos ver otra nueva característica mucho más

  22. Miemblogs says:

    Ahora que ya hemos visto que es una expresión lambda podemos ver otra nueva característica mucho más

  23. But, Eric, Common Lisp has a well fleshed-out OO system, and it not really FP-centric – at least most CL code that I’ve seen uses mutable state and imperative constructs (such as loops) heavily.

  24. Ahora que ya hemos visto que es una expresión lambda podemos ver otra nueva característica mucho más