UML Syntax and Semantics


So what really is the difference between syntax and semantics in UML?


Firstly there is the notation, or concrete syntax.  This defines what shapes are allowed on the diagrams: rectangles, ovals, lines, arrows, solid, dashed, compartments, annotations, adornments etc, and a set of rules about how these shapes combine and appear.


Then there is the abstract syntax.  This defines a set of concepts that the shapes correspond to: classes, components, usecases, activities etc.  What makes it abstract is that it is independent of which shapes are used to represent the concepts, or where those shapes appear.


Then there is the semantics.  This supposedly defines the meaning of models, and this is where things get tricky.


There are many ways of defining the semantics of computer languages.  Essentially, they involve systematically mapping expressions in the language to expressions in another language that you believe already has well-understood semantics.  The other language might be some combination of arithmetic, logic, and set theory.  In practice, though, the semantics of programming languages are defined by the compiler, and hopefully all compilers for a given language give the same result.


So for models that generate programs, the semantics of the models are defined by the programs they translate to, which are in turn defined by the compiler.


But UML is supposed to be able to map to multiple languages.  UML class diagrams ought to be able to map accurately to programs in VB.Net, Java, C#, C++, possibly as well as JavaScript, COBOL, and Python.  Similarly, UML sequence diagrams linked to those class diagrams ought to be able to visualize execution traces for all of these programs, and component models ought to be able to represent ports implemented by interfaces on those class diagrams.


Today, such an accurate mapping can’t be done without bending the rules.  But let’s be hopeful and assume that it becomes possible.  In such a world, what are the semantics of UML?  There must be some, otherwise all we can say about UML is that it is a language of shapes and lines which can be used for anything.


Well, there are some absolute basics which should apply regardless of how UML is applied.  We could call these axioms.


One aspect of the UML definition is well-formedness rules, such as: Elements in the same package must be uniquely named; The slots in an instance of a class must be related to features in the class or one of its superclasses; The edges coming into and out of a fork node must be either all object flows or all control flows. These well-formedness rules are the axioms of UML.  UML really has no built-in semantics apart from these axioms and what can be derived from them (the theorems, if you like).  Additional semantics only appear when UML is mapped into some target language or platform, either explicitly through a generator or transformation, or implicity in your head when you draw it on a whiteboard or sketching tool.

Comments (8)

  1. Cavan Watson says:

    Implicity in your head perhaps… in mine there’s at least one visual transform into an almost random ordered set of penflicks, and I remember it to ( often supprisingly loud ).

    "Kevin"    

  2. This week in the blog carnival modeling, T4, C# 4.0, framework desing, SOA, performance and ASP.NET related

  3. Syed Salman Qadri says:

    If only it were that simple. The trouble with UML, is that it has way too many edge cases. For example, your first axiom is false for elements that are Associations, as you can have multiple Association elements in a package that have the same exact name.

  4. stevecook says:

    Syed

    I am acutely aware of how complicated it is.  if you read the UML specification, you will find that association names are indeed supposed to be unique in a package.  UML tool vendors may have chosen to ignore this, of course.

  5. Syed Salman Qadri says:

    Can you point that out to me in the spec? If you look at the rules for UML PackageMerge, for example, it clearly allows for Associations with duplicate names; it determines their uniqueness by looking at the combined set of the Associations name, and the name of its member ends: "Elements that are a kind of Association match by name (including if they have no name) and by their association ends where

    those match by name and type (i.e., the same rule as properties)." In general, it would be a pain if you have to name every single Association in your package uniquely. Usually people tend to name them all "" :).

  6. stevecook says:

    The rules for name uniqueness are defined in Namespace and NamedElement in the kernel (7.3.33 and 7.3.34 in version 2.1.2).  Look at the operation isDistinguishableFrom in NamedElement.

    I do agree that this rule should be relaxed for associations, properties, operations etc.   The operation isDistinguishableFrom is overridden in BehavioralFeature.  It should be overridden in other places too (those edge cases that you refer to).

  7. Syed Salman Qadri says:

    Well, if you are correct then its not just that the UML Vendors have it wrong, but the UML metamodel is itself flawed. For example, look at "AuxiliaryConstructs-Templates-Package" in Superstructure. It has 3 Associations with the name ‘A_templateParameter_parameteredElement’.

  8. stevecook says:

    Syed – too right it is flawed.  You can post bugs by sending an email to issues@omg.org.  Validating that UML is a valid instance of UML would be a great exercise: we don’t yet have the spec in a form that makes this possible.