UML Syntax and Semantics

So what really is the difference between syntax and semantics in UML?

Firstly there is the notation, or concrete syntax.  This defines what shapes are allowed on the diagrams: rectangles, ovals, lines, arrows, solid, dashed, compartments, annotations, adornments etc, and a set of rules about how these shapes combine and appear.

Then there is the abstract syntax.  This defines a set of concepts that the shapes correspond to: classes, components, usecases, activities etc.  What makes it abstract is that it is independent of which shapes are used to represent the concepts, or where those shapes appear.

Then there is the semantics.  This supposedly defines the meaning of models, and this is where things get tricky.

There are many ways of defining the semantics of computer languages.  Essentially, they involve systematically mapping expressions in the language to expressions in another language that you believe already has well-understood semantics.  The other language might be some combination of arithmetic, logic, and set theory.  In practice, though, the semantics of programming languages are defined by the compiler, and hopefully all compilers for a given language give the same result.

So for models that generate programs, the semantics of the models are defined by the programs they translate to, which are in turn defined by the compiler.

But UML is supposed to be able to map to multiple languages.  UML class diagrams ought to be able to map accurately to programs in VB.Net, Java, C#, C++, possibly as well as JavaScript, COBOL, and Python.  Similarly, UML sequence diagrams linked to those class diagrams ought to be able to visualize execution traces for all of these programs, and component models ought to be able to represent ports implemented by interfaces on those class diagrams.

Today, such an accurate mapping can't be done without bending the rules.  But let’s be hopeful and assume that it becomes possible.  In such a world, what are the semantics of UML?  There must be some, otherwise all we can say about UML is that it is a language of shapes and lines which can be used for anything.

Well, there are some absolute basics which should apply regardless of how UML is applied.  We could call these axioms.

One aspect of the UML definition is well-formedness rules, such as: Elements in the same package must be uniquely named; The slots in an instance of a class must be related to features in the class or one of its superclasses; The edges coming into and out of a fork node must be either all object flows or all control flows. These well-formedness rules are the axioms of UML.  UML really has no built-in semantics apart from these axioms and what can be derived from them (the theorems, if you like).  Additional semantics only appear when UML is mapped into some target language or platform, either explicitly through a generator or transformation, or implicity in your head when you draw it on a whiteboard or sketching tool.