More ruminations on DSLs

Article
09/13/2004

A domain specific language is a language that's tuned to describing aspects of the chosen domain. Any language can be domain specific, provided you are able to identify the domain it is specific to and demonstrate that it is tuned to describe aspects of that domain. C# is a language specific to the (rather broad) domain of OO software. Its not a DSL for writing insurance systems, though. You could use it to write the software for an insurance system, but it's not exactly tuned to that domain.

So what is meant by the term 'domain'?. A common way to think about domains is to categorize them according to whether they are horizontal or vertical. Vertical domains include, for example: insurance systems, telephone billing systems, aircraft control systems, and so on. Horizontal domains include, for example, the bands in the classic waterfall method: requirements analysis, specification, design, implementation, deployment. New domains emerge by intersecting verticals and horizontals. So, for example, there is the domain of telephone billing systems implementation, which could have a matching DSL for programming telephone billing systems.

Domains can be broad or narrow, where broad ones can be further subdivided into narrow ones. So one can talk about the domain of real-time systems, with one sub-domain being aircraft control systems. Or the domain of web-based systems for conducting business over the internet, with a sub-domain being those particular to insurance versus another sub-domain of those dealing in electrical goods, say. And domains may overlap. For example, the domain of airport baggage control systems includes elements of real-time systems (the conveyer belts etc. that help deliver the luggage from the check-in desks to the aircraft) and database systems (to make a record of all the luggage checked in, its weight and who it belongs to, etc.).

So there are lots of domains. But is it necessary to have a language specific to each of them? Couldn't we just identify a small number of general purpose languages that cover the broad domains, and just use those for the sub-domains as well?

What we notice in this approach is that users demand general purpose languages that have extensibility mechanisms which allow the base language to be customized to narrower domains. There's always a desire to identify domain specific abstractions, because the right abstractions can help separate out the things that vary between systems in a domain and things that are common between them: you then only have to worry about the things that vary when defining systems in that domain.

Two extensibility mechanisms in common use today are:

class inheritance and delegate methods, which allow one to create OO code frameworks;
stereotypes and tagged values in UML which provide primitive mechanisms for attaching additional data to models.

These mechanisms take you so far, but do not exactly deliver customized languages that intuitively capture those domain specific abstractions - the problem is that the base language gets in the way. Using OO code frameworks is not exactly easy: it requires you to understand all or most of the mechanisms of the base language; then, although you get clues from the names of classes, methods and properties on where the extension points are, there is no substitute for good documentation, a raft of samples and understanding the framework architecture (patterns used and so on). Stereotypes and tagged values in UML are powerful in that you can decorate a model with virtually any data you like, but that data is generally unstructured and untyped, and often the intended meaning takes you a long way from the meaning of the language as described in the standard. Neither OO framework mechanisms or UML extensibility mechanisms, allow you to customize the concrete notation of the language, though some UML tools allow stereotypes to be identified with bitmaps that can be used to decorate the graphical notation.

Instead of defining extensibility mechanisms in the language, why not just open up the tools used to define languages in the first place, either to customize an existing language or create a new one?

Well, it could be argued that designing languages is hard, and tooling them (especially programming languages) even harder. And the tools used to support the language design process can only be used by experts. That probably is the case for programming languages, but I'm not sure it needs to be the case for (modelling) languages that might target other horizontal domains (e.g. design, requirements analysis, business modelling), where we are less interested in efficient, robust and secure execution of expressions in the language, and more interested in using them for communication, analysis and as input to transformations. Analysis may involve some execution, animation or simulation, but, as these models are not the deployed software, it doesn't have to be as efficient, robust or secure. Other forms of analysis include consistency checking with other models, possibly expressed in other DSLs, taking metrics from a model, and so on. Code generation is an obvious transformation that is performed on models, but equally one might translate models into other (non-code) models.

It could also be argued that having too many languages is a barrier to communication - too much to learn. I might be persuaded to agree with that statement, but only where the languages involved are targeted at the same domain and express the same concepts differently for no apparent reason (e.g. UML reduced the number of languages for sketching OO designs to one). Though it is worth pointing out that just having one language in a domain can lead to stagnation, and for domains where the languages and technologies are immature, inevitably there will be a plethora of different approaches until natural selection promotes the most viable ones - unless of course this process is interrupted by premature standardization :-). On the other hand, where a language is targeted on a broad domain, and then customized using its own extensibility mechanisms, the result carries a whole new layer of meaning (OO frameworks, stereotypes in UML), or even an entirely different meaning (some advanced uses of stereotypes). In the former case, there is a chance that someone who just understands the base language might be able to understand the extension without help; in the latter case, I'd argue that the use of the base language can actually hinder understanding, as it replaces the meaning of existing notation with something different.

Finally, whether we like it or not, people and organizations will continue to invent and use their own DSLs. Some of these may never ever get completed and will continue to evolve. Just look at the increasing use of XML to define DSLs to help automate the software development process - input to code generators, deployment scripts and so on. Yes, XML is a toolkit for defining DSLs; it's just that there are certain things missing: you can't define your own notation, certainly not a graphical one; the one you get is verbose; validation of well-formedness is weak.

Am I going to tell you what a toolkit for building domain specific modelling languages should look like? Soon I hope, but I've run out of time now. And I'm sure that some folks reading this will give feedback with pointers to their own kits.

One parting thought. In this entry, I have given the impression that you identify a domain and then define one or more languages to describe it. But perhaps it's the other way round: the language defines the domain…

More ruminations on DSLs

Additional resources