ActorFx Applied in Dynamically Distributed Social Media Apps

Article
08/27/2013

From the ActorFx team:

Brian Grunkemeyer, Senior Software Engineer, Microsoft Open Technologies Hub

Joe Hoag, Senior Software Engineer, Microsoft Open Technologies Hub

Today we’d like to talk about yet another ActorFx example that shows the flexibility and tests the scalability of the ActorFx Framework.

ActorFx provides an open source, non-prescriptive, language-independent model of dynamic distributed objects for building highly available data structures and other logical entities via a standardized framework and infrastructure. ActorFx is based on the idea of the mathematical Actor Model for cloud computing.

The example is a social media example called Fakebook that demonstrates a simple way to apply actors to large-scale problems by deconstructing them into small units. This demo highlights ActorFx’s ability to dramatically simplify cloud computing by showing you that surprisingly few concepts are necessary to build a scalable and interesting application with a relatively simple programming model. The example is part of the latest build on the ActorFx CodePlex site.

ActorFx Framework Background

The ActorFx Framework lets developers easily build objects for the cloud, called actors. These actors allow developers to send messages, respond to messages, and create other actors. Our Actor Framework adds in a publish/subscribe mechanism for events as well. Using these, we have built the start of a Base Class Library for the Cloud, including CloudList<T> and CloudDictionary<TKey, TValue> modelled after .NET collections. We’ve harnessed the ActorFx Framework’s pub/sub mechanism to build an ObservableCloudList<T> that is useful for data binding collections to UI controls. These primitives are sufficient to build a useful system we call Fakebook, a sample social media application.

Fakebook Design

Social media apps have many properties that map quite well to actors. Most data storage is essentially embarrassingly parallel – users have their own profile, their own photos, their own list of friends, etc. Whenever operations span multiple users, such as making friends or posting news items, this requires calling into other people’s objects or publishing messages to alert them to new information. All of these map well to the Actor Framework’s concept of actors, including message passing between actors and our publish/subscribe event pattern.

Let’s decompose a social network into its constituent parts. At the core, a social network provides a representation of a person’s profile, their friends, their photos, and a newsfeed. All of this is tied together using a user name as an identifier. Other features could be added on top of this, such as graph search, a nice timeline view, the ability to play games, or even advertising. However, those are out of scope for our Fakebook sample, which is demonstrating that the core parts could be built using actors.

See Figure 1 for a clear breakdown of the UI into constituent parts.

Figure 1 - Fakebook UI Design

The core idea of Fakebook is to model a user as a set of actors. In our model, a Fakebook user is represented by the following actors:

· A person actor that tracks their name and all basic profile information (like gender, birthday, residence, etc).

· A friends list, which is a CloudList<String> containing the account name of all friends.

· A photos dictionary, which is a CloudStringDictionary<PictureInfo>, mapping image name to a type representing a picture and its interesting characteristics (such as author, licensing information, etc).

· A set of news posts made by the author, as a CloudList<String>.

· A newsfeed actor that aggregates posts from all of a person’s friend’s news feeds.

Interaction between actors can be done in one of two ways: actor-to-actor method calls (either via IActorProxy’s Request or Command methods), or via events sent via our publish/subscribe mechanism. For Fakebook, we use a mix of approaches. When constructing other actors or adding friends, we use IActorProxy’s Request.

The newsfeed actor works entirely based on publish/subscribe events. The newsfeed actor is subscribed to all of the person’s friends’ news posts. Similarly, the newsfeed actor is also subscribed to the person’s friends list, so whenever a friend is added or removed (unfriended), the newsfeed actor knows to subscribe or unsubscribe as appropriate. For interacting with the client, we also use publish/subscribe messages. The client can data-bind an ObservableCloudList<T> to a UI control like a ListBox. This allows users to seamlessly push updates from the actor in the cloud to the client-side UI. See Figure 2 for an example showing how publish/subscribe messages are sent between actors as well as from actors to the client UI.

Figure 2 - Fakebook Publish/Subscribe messages: Actor-to-Actor and Actor-to-Client

We’ve extended support for LINQ queries to allow transforming observable collections from one type to another. This is very useful for bridging from cloud-based data types (such as account names stored as strings) to rich client-side view model types (such as a FakebookPerson object). LINQ’s Select can be used to change an ObservableCloudList<String> into an IEnumerable<FakebookPerson>, and we’ve done the engineering work to preserve the event stream of updates as well. Consider the following code:

privateObservableCloudList<String> _friends = …;

// Here, we need an ObservableCloudList<String> converted to an observable sequence

// that also passes through INotifyCollectionChanged events. When exploring how to

// build this, we settled on using a Select LINQ operator for our type. Due to

// some problems with C# method binding rules, we had to add an interface

// to express the right type information. But, this works.

var people = _friends.Select(accountName =>

newFakebookPerson(App.FabricAddress, accountName, App.ConnectThroughGateway));

Dispatcher.Invoke(() => FriendsListBox.ItemsSource = people);

By using client-side view model types, a client application’s data binding code can easily access properties such as a person’s first & last names as well as profile picture. This information is not as easily obtainable via just the user’s account name, so a transformation like this helps keep the level of abstraction high when writing client apps while preserving the dynamic nature of the underlying data.

Deploying Code

Fakebook’s NewsFeed and Person actors were implemented in a curious way. For our collection actors, we built the application then used a complex, actor-specific deployment directory structure with many supporting files in place to describe the actor. This has the advantage that our infrastructure knows what a list actor is, in great detail. However, for Fakebook a more flexible approach was employed. The NewsFeed and Person actors both define .NET assemblies with a type each that contains a number of actor methods. But we use our empty actor implementation to deploy these assemblies to each of them. This makes it easy to update the implementation, and is a bit easier to manage.

For the Actor Runtime’s infrastructure, we create a service consisting of the empty actor app, called “fabric:/actor/fakebook”. We then create instances of that service with names like “fabric:/actor/fakebook/People” and “fabric:/actor/fakebook/Newsfeed”. Using a partitioning scheme, we then create individual IActorStates for individual accounts. So an account name like “Bob.Smith” is used as a partition key for the People and Newsfeed actors. Fakebook makes use of similarly partitioned list and dictionary actors as well.

Reliability

The Actor Framework provides for high availability by allowing multiple replicas of an actor to run within the cloud. If a machine holding one replica crashes, then we already have other replicas to choose from to continue running the service. The number of replicas is configurable, and for Fakebook we are using a total of two replicas per service. The Actor Runtime will designate one as the primary and a second as the secondary. Additional replicas all become secondaries.

As you may have read in the Actor Framework documentation, actors achieve a limited transaction-like set of semantics via a quorum commit. Changes are not committed to the primary until they are first written to a quorum of secondaries. After that, the changes are committed to the primary and acknowledged to the client.

Meanwhile, requests from the client to an actor are auto-idempotent. Consider a client that makes a request and never receives an acknowledgement. This could happen for multiple reasons. The server could have crashed before completing the operation, after completing the operation & before acknowledging the operation. Similarly, network connectivity could have been lost. The Actor Framework solves this via that a unique sequence number for every request, and storing the result of the last request in the actor’s replicated state. By doing this, if a client issues request #5 and loses its network connection, it can reconnect then safely re-issue request 5. If the operation already ran, the client will get the previously cached result from actor state. If the request didn’t successfully complete the first time, the request will now execute. Importantly, the complexity of handling this is built into the client-side & server-side logic of the Actor Framework, so users do not need to think about this. They simply call a method.

One not-yet-implemented feature is persistence. All data in actors is made highly available via replication across machines, but all that data is stored in memory. Actor state is not currently persisted to disk or any other network storage mechanism like an Azure blob or table. One consequence is that if the power goes out to the entire cluster, you lose everything. This is an active area for future development.

Scalability & Performance

A traditional database system might model a user using several tables, with one row for profile information, and multiple rows representing friends & images for each person. Scalability challenges would be hit once you exceed the number of operations possible on an individual database. (ie, SQL Server’s new Hekaton engine may max out around 60,000 operations/second.) While databases can go a long ways, enterprises need systems capable of scaling to a much wider scale. Fakebook is an attempt at building a competing vision for data storage. While scale up is important, an actor model with relatively easily separable data like Fakebook should be ideal for scale out. Each actor can maintain its own state in memory, local to that actor. The hope is a database acting as a single point of failure is not necessary.

One common criticism of actor models is that while they are easy to write, they require a significant amount of optimization. We found the same problem, and needed to invest in multi-tenant actors to improve the scalability. Performance is still very much a work in progress for the Actor Framework, but we hope to show some of the optimizations we found particularly valuable, as well as some we hope to make in the future.

Partitioning

One of the challenges is to get the most use out of each machine. There are a finite number of sockets on each machine. Similarly, new actors require allocating state and entries in a naming table that can slow things down. The creation of service instances on the Actor Runtime is unfortunately slow and can gate our performance. Mapping one actor to one process is a horrible idea, and mapping one actor to shared state within a process can often be insufficient. To solve this, we looked into partitioned service instances for hosting multi-tenant actors.

Think of partitioning as creating buckets for actors on various machines. Those buckets can be empty, or you can fill them with multiple actors. By doing this, the cost of creating the buckets is amortized over the number of partitions. Each individual actor is created within its appropriate bucket, and receives its own isolated version of actor state. See Figure 3.

Figure 3 - Partitioned List Actors

This allows for actor creation times to be vastly faster, by about 3 orders of magnitude.

Now let’s draw a more complete picture. The Actor Runtime will use processes on various machines for each service type (like a list service, a dictionary service, our empty actor service, etc). Each process hosts zero or more service instances (both primaries and secondaries, though let’s ignore secondaries for now). Processes take a while to spin up and service instances are expensive to create. In a naïve hosting scenario without partitioning, consider a list service and a dictionary service with many instances, mapping to one collection each. Here is what will be running on a cluster. Service instances are in blue below.

Figure 4 - Non-Partitioned Hosting

In the picture above, creating new individual collections requires creating new service instances, which is an expensive operation. With partitioning in the picture, we create fewer service instances.

Figure 5 - Partitioned Hosting

Replication is done in the Actor Framework at the service instance level. So in this picture, “List A” and “List B” both share the same IActorState in terms of replication. However, this could lead to conflicts if both lists used a field of the same name, allowing one list to scribble over values from a separate list. The Actor Framework provides a further level of state isolation (an IsolatedActorState) that is a sub-space within an IActorState. So partitioned actors share the same replication mechanism & characteristics, but get their own isolated view of their state.

The mapping from name of an individual list is affected by partitioning. Without partitioning, the Actor Runtime will load balance at the service instance level and allocates them to machines in a reasonable way. However when using partitioning, the mapping from name of a list to a list partition is done by hashing the name then mapping it onto a range. For example, a name like “List A” may hash to a value between 1 and 100, and all hash values in the range 1-25 are mapped to list partition 1, 26-50 to list partition 2, etc.

Fakebook employs partitioning for each of the actors it uses – lists, dictionaries, person actors & news feed actors. This substantially reduced the cost to create new people.

Proxy Problems

Another challenge involves actor to actor communication, where we use IActorProxy to represent a communications channel between two actors. Our initial design required a unique proxy object for every actor-to-actor communication, and cached proxies for quick reuse later. In a simple example, I envisioned 256 people on Fakebook, and each of those 256 people would be a friend with all 255 other people. However, this approach was flawed. Keeping those IActorProxy objects cached required keeping just under 64K sockets open. We tried developing this on one machine and ran out of sockets!

To fix this, we are exploring two approaches – sharing sockets when talking between actor partition buckets, as well as ditching the caching & adding in async support for establishing new actor proxies. We anticipate this will contribute substantial wins.

Batching Data Transfers

One of the most significant ways to improve performance is to reduce the chattiness of your protocols over the network. As a simple example, one of our Actor Framework performance tests adds one million integers to a CloudList<T> and sorts them. Adding one million elements one at a time was horribly inefficient. Instead, we needed to change the protocol to support batching to get a higher level of performance. Additionally, readers must note that method calls that turn into network calls are significantly less reliable than normal method calls. These considerations led to a new interface, IListAsync<T>:

namespace System.Cloud.Collections

{

publicinterfaceIListAsync<T> : ICollectionAsync<T>

{

Task<T> GetItemAsync(int index);

Task SetItemAsync(int index, T value);

Task<int> IndexOfAsync(T item);

Task InsertAsync(int index, T item);

Task RemoveAtAsync(int index);

// Less chatty versions

Task AddAsync(IEnumerable<T> items);

Task RemoveRangeAsync(int index, int count);

}

The AddAsync and RemoveRangeAsync methods will perform much better. Additionally, since they are async methods, they allow clients to more easily adopt async method calls as a common design pattern, then use continuations to resume execution when those operations have completed.

A similar change was made to actor methods. In the example of setting up friendships between multiple people, changing the ForceMakeFriends method to take an array of friends led to a substantial improvement.

More details on how to run the Fakebook example can be found in the CodePlex site, and we are continuing work on the v0.60 release. Subscribe to the CodePlex feed and our Blog to be kept up to date on the latest developments.

For those of you who are using ActorFx in implementations, we’d always like to hear more about how it’s useful to you and how it can be improved. We are looking forward to your comments/suggestions, and stay tuned for more cool stuff coming in our next release!