Why language features die, and language extensibility

Rick Byers wrote (some time ago):

Thanks for the awesome post Eric. I’d be interested in hearing more detail about the sorts of things that cause features to be rejected. Is it common to reject a feature that you think would be valuable only because of syntactic compatibility limitations (parser ambiguity, breaking change, etc)?

What are you thoughts on how language evolution should work in general (outside the confines of C#)? Do you think it would be possible to have languages that could more readily accept the type of extensions you’ve wanted to make to C# but couldn’t?

For example, do you think there would be value in a language that added a layer of abstraction between the syntax presented to a user, and the persisted form? Eg. if a language was stored on disk as an XML representation of the parse tree, then you could evolve the language (add keywords, etc.) and rely on the IDE tools to intelligent present the code to the user.

I’ve been saving up this one for a while now.

It’s common for us to reject some features because they aren’t along the lines of our language philosophy.

It’s also fairly common for us to reject a feature because we can’t come up with a good syntax for the feature. Sometimes this is because we just don’t like the constructs we come up with, because they are ugly, or they don’t really make things simpler for the user, or they don’t cover the right scenarios. The syntax we can use is heavily constrained by the existing structure of the language. Take a look at your keyboard, look at all the special characters, and tell me which ones aren’t already used for something in C#. The list is very short, so we are constrained by the operators that are available. We’re also constrained by whether our change would be breaking, and in what situations things would be breaking. C# 2.0 has no major breaking changes, and though that isn’t an absolute rule for us, it’s certainly a goal. Adding new keywords is, in general, a bad thing to do.

Finally, we’re constrained by what the runtime can/will implement, and whether things can be implemented across languages. Some features only make sense if they’re done in all the languages, but that means all languages need to agree before we do it.

Rick also asked about language evolution.

There are different opinions about this. Some believe that languages should never change. Others believe that they should be able to extend their language at will. An extreme example of this is Intentional Programming.

I think I’m one of the few people around who have actually played around with intentional programming. Conceptually, it’s interesting, but in the real world, I think the “everybody designs their own language” approach is challenging at best. One can envision a world where the user representation is extensible but the underlying representation is standard, but I think that’s a bad world to be in. It may be great for you, but it’s probably not good for your team, or the poor guy who takes over your code two years from now. And there’s a lot to be said for the “code in a text file“ world.

We have well-defined ways for users to add functionality – through classes, methods, interfaces, etc. I think that languages should only consider adding features when there is an obvious shortcoming to solving the solution through existing functionality. At that point, you need to understand those issues and determine whether the language solution is the right way to address the issue.

So I’m not big on extensible languages. Existing facilities – such as Macros in C++, do have their users, but are a disaster from the readability standpoint (both for the compiler and the developer).

Comments (7)

  1. Steve Perry says:

    To me the ultimate in extensible languages is <a href="http://el.media.mit.edu/logo-foundation/logo/index.html">LOGO</a&gt;

    But you are correct extensible languages are interesting for educational purposes, but not practicle to everyday programming.

  2. gab says:

    well, but the LISP guys and partially the smalltalk guys have been rewriting their own languages for decades, and the results have not been so bad..

  3. Mark Hurd says:

    There is definitely an alternative school of thought: Lisp, Forth, C Macros, VB6 and VBA all allow the programmer to define constructs that are indistinguishable from built-in language keywords and features.

    However, clearly some people can never really treat a construct as a black box. Irrespective of how simple the use of the construct is, they always complain about the hidden complexity, even though it /is/ hidden.

    It seems these people have been listened to more when C# and VB.NET (and the .NET Framework) were designed.

  4. M Knight says:

    Mark Hurd, one reason I can thinkup for the complaining about "hidden" complexity is the concept of leaky abstractions. Abstractions are never perfect, you always get something leaking through.

  5. Darren Oakey says:


    I have been thinking heaps about what you’ve said about extensible languages and stuff. I’ve been going round and round, because I understand where you’re coming from completely – in fact I’ve seen problems in the past exactly as you mention. Years and years and years ago I was working in C++, and needed to do a case across a lot of strings… I created a #define STEQ which sort of gave me a syntax like:

    > STEQ("firstCommand") {DoFirst();}

    > STEQ("secondCommand") {DoSecond();}

    > STEQ("thirdCommand") {DoThird();}

    now, obviously I think it’s hideous but at the time, I thought it was a good idea, because there were a LOT of cases, and I wasn’t that experienced (this is more than 10 years ago we’re talking about) – but the main point is, people who maintained it after me whinged, because they hadn’t seen the construct before…


    The more I thought about all of these issues, the more muddy the answer got.. First, you mention intentional programming. You talk about the way MS is doing IP and that many people who like the idea won’t like the practice. I think I very much disagree with that. The thing is, we ALL are using IP type principles every day. I almost never build a form by hand, I use a forms designer – I HATE the fact that it actually generates code that I can see, and applaud you for addressing that in this next version of the library… but the fact remains, every time we use a dataset designer, a class designer, a forms designer, we are doing IP-like stuff. Even more, when we go to Avalon we’ll be using XUL, and at the moment we are using HTML for web stuff – so we are already using a conglomerate of languages and tools to build our apps, and I think every one of these improves productivity. Combine code snippets, enterprise templates, all the designers together, and you have an IP language!

    I’ve worked at a place where all the development was driven by Rose – because it was C++, and the rose reverse engineering just didn’t work on our project because it had been migrated from old and older versions of Rose – if you wanted to add a function, you had to do it in Rose. It was a pain in the neck, but we effectively used Rose as our primary development tool.

    I wasn’t that crash hot on that, but when I moved on to a new role where I had complete control, I took Rose with me – but all test cases and all "high level" functionality had to first be created as a sequence diagram (I have very little use for class diagram, but sequence diagrams and collaboration diagrams are brilliant). We were then in VB, and the reverse engineering worked… and if XDE wasn’t such a piece of ____ I’d be doing the same thing with .NET….

    so basically, the more specific languages, designers, diagrammers, builders, generators, etc the better IMHO – they are all productivity improvers in some way – if they weren’t we wouldn’t use them, and they’d die a natural death…

    But, it gets trickier when you talk about people modifying the language themselves – because at least with generators etc, one particular company will likely only use a few, so there is not too much for a new developer to learn – whereas customisable syntax could just confuse anyone…

    or could it.

    I realised there is one factor that really decides the issue for me… and that is the skills of the developers… or more precisely, the fact that the skills of developers are _vastly_ different – and to me, this has EVERYTHING to do with whether or not we need to be able to customise the language.

    Now me… lets be immodest for a second… I got into uni when I was 13 – I started programming more than 20 years ago, and have programmed more languages than I can remember. I’ve probably averaged at least 8 hours a day, every day, on a computer since then. I figure, I sit down at a machine, I know what I’m doing. But I’ve worked with for example, programmers who were older government employees who used to look after an old database and now have nothing to do, so are being reskilled – when I was asked what sort of course they should go on to learn VB.Net – I answered "a course on OO" – but they then made the mistake of asking the people themselves – who immediately said… VB.Net. So – armed with no knowledge of programming, 4 days of intensive "Introduction to VB.Net", and you have yourself a maintenance development team! Another example.. in a different company, I was brought in by a team who had kept pretty much to themselves to do a review of their systems and see if I could fix the performance problems. I discovered immaculate, beautiful disciplined code…. but the people who’d designed it (this was a VB program) – had decided to store ALL program state in a set of global structures (basically an in-memory database)… and so I saw inner loops in a plotting program with lines like int nextPoint = gblah[sdf].sdfdsf[gblah[dyd]].sdfasdf.sadf.blah[y].something[x].blah. I sped up their system more than 20x with the simple line set blah = [most of the above expression] OUTSIDE the loops!

    So anyway, the point is, at any business, you will have brilliant coders who study, who know what they are doing, and can make the world alright with a flick of a neuron. And then you have people who couldn’t spell the word encapsulation, let alone tell you what it meant. And most people lie somewhere in the middle (dare I suggest leaning to one side? :))

    And then you have coding standards.

    What a waste of time.

    Because… the average programmer, when asked why they aren’t doing what’s in the standards document… looks at you blankly and asks… "we have a standards document?"

    And we have FXCop. I have a problem with FXCop… it’s absolutely vital of course… but if MS is going to put a rule in, or I’m going to put a rule in, then… well… it’s a rule! It must be followed. It should have the same status as all the other rules – like "int x="hello"" is bad. The user shouldn’t check in and wait half an hour for the build to tell them that they have named a variable badly – intellisense should do that, as they’re typing. So… first and most important reason for some sort of macro or codedom interaction. To a greater level than simple code, the "good" programmers can FORCE their standards upon the other coders.

    Onto readability. I understand this is the main point against macros and things – because you sort of want the code to LOOK like C# – it’s a problem if coders come in and don’t understand what they are seeing.. Or is it. I thought about that, and I’m not so sure. Bear in mind one big assumption I’m making: – the only people in the organisation allowed to use these features are the "gurus" – the architects who are directing the environment, not the people using it… So the people using these tools know what they are doing. I know that’s not necessarily a valid assumption, but leave it for another discussion. So

    I was thinking, when you look at an unfamiliar piece of code, what takes you the time? Well, I switched from VB to C# in about an hour, sure I had a C++ background, but others in my team did similar things, and they’d never seen other languages. I realised that the syntax of a language is typically VERY easy and quick to pick up. What takes you ALL the time is learning the class library. I can look at a piece of Java and tell you exactly what it does – but it takes me a while to sit down and program in Java, because I don’t know Swing and other libraries as well as I do System.Windows.Forms.

    So.. is it really a bad thing to have custom keywords and language constructs? Especially if they are only used for COMMON functionality – basically again, enforcing standards so that the good people have control over what the less experienced people do. Let’s take a concrete example. One of the most common patterns I use is the WRAPPER. If we use a 3rd party product, we always wrap it to protect ourselves against change – but more frequently, if I make a specific control, I want to protect myself against it being used badly. I want to lock down things about it.

    For instance, suppose I’m making a BankAccount edit control. I’m thinking lego blocks here. As far as I’m concerned, the entire interface you should be able to see for the BankAccount edit control is "here’s the bank account to edit". I don’t want the BackColor exposed, because I don’t want the coder to be able to change the back color. If someone is editing a bank account on one screen, it should have exactly the same backColor as on another screen – it’s our "standard" look and feel. I’m sorry, but you the developer, don’t have the RIGHT to change it. You’re not allowed! Also, it’s none of your business that it is just an SMBAutoDataEditor – because you might be able to cast it to that and do things that I’m not expecting you to do… and I don’t want you to do that… SO I wrap it. Instead of INHERITING from SMBAutoDataEditor, I create usercontrol BankAccountEditor, which is a user control, and has SMBAutoDataEditor added to it.. So, really, it should be the following lines:

    > public class BankAccountEditor : ControlWrapper<SMBAutoDataEditor>

    > {

    > BankAccountEditor() : base( new SMBAutoDataEditor ) {}

    > public BankAccount BankAccountToEdit {get {return EditedValue as BankAccount;} set {EditedValue = value;}}

    > }

    But it’s not that simple, because obviously there are SOME things that I might want to expose. Let’s suppose the Description, HelpText, and PreferenceCategory properties… So I have to go:

    > public string HelpText {get {return WrappedControl.HelpText;} set {WrappedControl.HelpText = value;}}

    > public string Description {get {return WrappedControl.Description;} set {WrappedControl.Description = value;}}

    > public PreferenceCategory PreferenceCategory {get {return WrappedControl.PreferenceCategory;} set {WrappedControl.PreferenceCategory = value;}}

    What a right-royal pain… If I could muck with the language and extend it, I would make a word "expose" and replace the above lines with the following:

    > expose HelpText, Description, PreferenceCategory from WrappedControl;

    Not just is this easier to read, easier to type, quicker to build, easier to maintain, but when I go and change the type of Description to be ControlDescription instead of string – it all just works! Now, I think, for anyone, the latter is MORE READABLE, but especially so since it is a common word that would occur all through the code. Sure, for someone unfamiliar with it, they’ll have to look it up… but then they’ll also have to look up PreferenceCategory, because it’s a class I’ve created – what’s the difference?

    However… the final kicker for language extensibility… comes down purely to the point of "what was left out". I have very strong opinions of what a language should look like. I think it’s _wrong_ to have variables nullable by default, and want to change it. I’m still putting a null guard on every variable passed into every function, and requiring the people who work at this company to do the same – which just adds needless lines to every function. I think a modern language requires a const. And because VS2005 is beta, and my company would get rather upset if I started building production code in it, we’re still using a code generator to generate reams upon reams of collection classes, data classes etc, because we don’t have generics. Which would all be fine and dandy if people could be trusted not to dick with the code that is produced by the generator… but they can’t – and they don’t – so we have a 1000 collection classes that are _almost_ the same. While VS2005 is brilliantly addressing a lot of my issues – the point is,

    without language extensibility, I can’t address them NOW

    with language extensibility, I could!

    So… I guess that’s my vote – and my rationale. Language extensibility is a MUST HAVE requirement for one reason and one reason only – to allow better programmers to FORCE less experienced coders to code in exactly the way that they want them to code. But that reason is, IMNHSO, quite sufficient!