Isolation in Maestro

Disclaimer: The Maestro project has been renamed. While referred to in this article as ‘Maestro’, moving forward the project is referred to with codename ‘Axum’.

Copied from 'Concurrently Speaking'  

As noted in the Dr Dobb's article, Maestro is primarily about establishing isolation domains so that we can cut down on the number of undocumented dependencies between components. With a language like C# or VB, any two references of type T could be referring to the same object, and if you consider a whole object graph, you have to keep track of all the references within the graph. In C++, you don't quite know where your pointers have been, or what type they started out as, so the problem is even worse there.

Sometimes, keeping track of references in your own ad hoc way is easy to do, for example when you have a very small program that doesn't call into libraries you don't know much about and you're the only developer working on it. Or, maybe you have really spent a lot of time on data design and carefully architected your application to avoid concurrency issues. If so, what happens when you start on the next version of the software, suddenly under customer pressure to quickly provide ever-increasing value?

It is more or less against the nature of object-oriented languages to restrict access to objects in the ways that would make programming parallel programs easier and safer, so we need to look elsewhere for inspiration.

There are several places to look -- functional languages, for example, offer a great solution by not allowing side-effects. Without side-effects, there's no reader/writer competition and data races cease to exist as a concern. Of course, most interesting computer activities are all about side-effects, so we need to escape the model from time to time. That doesn't diminish the value of the functional approach to programming -- you have significantly restricted the number of areas in your application that you have to manage yourself, which is very valuable in itself unless you are a theorist for whom only a completely pure model is acceptable.

Inspiration from the Web

This need to escape the model is not what causes us to look elsewhere, it is the fact that all the mainstream platforms are unsuitable for functional programming, as they have been designed with imperative languages in mind. With Maestro, we instead looked at the web for inspiration -- it also offers an isolation model, based on separating address spaces. Simply, if a pointer isn't valid in your address space, and you cannot send it to another, you don't have to worry as much about aliasing.

Of course, separated address spaces has a very high overhead, so we're trying to use the model rather than the implementation, letting a compiler enforce the constraints rather than the OS (compilers are particularly good at that).

Domains

In Maestro, the key isolation concept is a domain, which limits the runtime scope of data to its compile-time scope. In other words, objects that are created within a particular domain don't escape it. The only thing that may escape a domain is copies of its data or instances of immutable types (which .NET doesn't have a lot of, but String is an example).

A domain looks like this:

domain D1
{

    object obj = new object();
string str = "Hello!";
}

You cannot call a method on a domain from outside it -- all its methods are either private or protected; the only thing you can do from the outside is create the domain:

var d = new D1();

Agents

So how do you manipulate the state? After all, data we cannot reach is just a waste of memory. We give you access to domain data via agents, which run on a thread that is different from the "caller." Agents are active components, while domains are inactive. This means that agents may have their own control-flow and act independent of the client that created it.

Agents also cannot have their methods called from outside the body of the agent. In fact, agent instances are not created using a constructor, nor do we ever have the opportunity to hold a reference to an agent (thus, reflection-based invocations are harder). Instead, when we create an agent instance, the Maestro runtime established a communication channel for us to use when talking to the agent. This is called the agent's primary channel, which is explicitly typed in the agent declaration:

domain D1
{

    object obj = new object();
string str = "Hello!";

    agent A1 : channel C1
{
A1()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
PrimaryChannel::Result <-- 10;
}
}

}

As you can see, this agent has a channel type called 'C1' and starts its work by receiving a message from its primary channel. Receive is a built-in function of Maestro and is one of three ways to receive messages coming from outside a domain. The agent ends by sending the value '10' as the result of its work.

As declared, A1 instances only have access to immutable domain state. While the string instance is immutable, the reference 'str' itself is not, so A1 does not have access to anything in D1. Because they don't, A1 instances can safely run in parallel with all other agent instances inside our outside the domain.

We can give it access to domain state by adding a keyword to the agent declaration:

domain D1
{

    object obj = new object();
string str = "Hello!";

    agent A1 : channel C1
{
A1()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
PrimaryChannel::Result <-- 10;
}
    }

    reader agent A2 : channel C1
{
A2()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
String myStr = str; // can read
str = “new string” // error: but cannot write
PrimaryChannel::Result <-- 10;
}
}
}

Unlike instances of A1, instances of A2 may read the domain fields and the instances they refer to and may use them in their work. They may, however, not modify either the fields or the instances they refer to. To do so, the agent has to be declared a 'writer':

domain D1
{

    object obj = new object();
string str = "Hello!";

    agent A1 : channel C1
{
A1()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
PrimaryChannel::Result <-- 10;
}
}

    reader agent A2 : channel C1
{
A2()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
String myStr = str; // can read
str = “new string” // error: but cannot write
PrimaryChannel::Result <-- 10;
}
}

    writer agent A3 : channel C1
{
A3()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
String myStr = str; // can read
str = “new string” // and write
PrimaryChannel::Result <-- 10;
}
}
}

Here, A3 may change the values of 'obj' and 'str' or modify the instances they refer to (in this case, both are immutable, but 'obj' could point to something that isn't later on). All instances of A1 can still run without coordination with other agents, but instances of A2 and A3 must coordinate their executions.

The reader / writer attribution is used to do this -- as many A2 instances as are available may run in parallel as long as no A3 instance is running. Only one instance of A3 may be executing code at any given point in time.

How to agents yield to each other, then? They do so by receiving messages. Waiting for a message means giving up your execution rights until the message is available. Thus, all coordination between agents is achieved via message-passing.

The Maestro agents concept is very much related to C++ agents, which I discussed at and after PDC. In managed code, we have a lot more infrastructure at our disposal to enforce constraints. For example, creating a new domain language is much more reasonable for .NET than for Win32.

In this post, I didn't go into detail on how to define the channels that agents use to coordinate their work, nor how Maestro interacts with the rest of .NET in a safe manner. There are a couple of other concepts that also need explanation, such as message-passing, data-flow, failure models, protocols and payload schema, but they will have to wait until another time.

 

Niklas Gustafsson