Distributed Cache Capabilities - Data Manipulation

Article
03/01/2011

This is the second entry, data manipulation. Each one of them have a description and a generic guideline based on the experience of our implementations. I have added a weight, may look arbitrary but is based on the average importance on our projects.

Capability	Description	Weight
Atomic operations	Usually implemented through Get/Set. Allows clients to read and write operations in a single and atomic operation preventing dirty reads/writes safety, without getting into race conditions under multi-threaded environments.	5
Guideline	This feature is highly recommended, we would advise to avoid any cache that does not contain atomic operations as is a fundamental part of a multi connection environment. The only scenario where this can be acceptable is if you are not concern about dirty reads.

Capability	Description	Weight
Multi Gets	Multi Gets allows clients to retrieve multiple cache items within the same operation. The return type is usually an array of objects rather than complex types (i.e List, Hashtables)	3
Guideline	If you are planning to retrieve multiple items at the same time then this is an interesting feature as reduces the network traffic as well as the thread utilization levels, but will increase the CPU utilization, therefore use it only when needed. Having individual gets on the hand will require a separate call per retrieval, affecting the network and cache performance.

Capability	Description	Weight
Multi Writes	Multiple writes allow clients to send multiple cache items to the cache system at the same time. The cache then will navigate through the items and update accordingly. During this operation, the order is usually not guaranteed.	3
Guideline	Do not confuse this with bulk insertion; the idea is to use the same call to send an array of items. This can be useful on scenarios that require multiple cache entries. For example, this won’t be required if you just store a single model per operation but very useful if you store the changes on the client and then update them at the same time once the full process has finished. This can help reducing the network traffic and thread utilization but usually requires extra CPU time and may produce queues.

Capability	Description	Weight
GetAndSet/SetAndGet operation	These operations are usually known as “double-shooters”, the cache receives a request to perform two operations at the same time (i.e increment and read result), usually involving a longer lock. These operations avoid race conditions on a multi-threaded environment.	2
Guideline	If the cache allows different operations like increment, decrement or bitwise operations it can help to solve certain scenarios where you need to maintain unique values across all your client instances (i.e. the amount of clients connected). If the value is not shared across clients then is preferable to calculate first on the client and then send a standard “set” to the cache, as it can affect concurrency.

Capability	Description	Weight
Relationship/Union capability	This capability allows cache items to be related to other cache items or even cache groups. For example an item that contains an order may be related to other cache item that stores the customer information. This relationship is usually maintained in a different cache table and is subjective to multi-locks, as the items need to be queried from the relationship table rather than a hash entry.	2
Guideline	Use this feature with care, the relationship model will emulate a relational database and only should be used if you have requirements to do so. The main reason is the extra hop that the items need to go through every time they are accessed. The relationship table will use a lock as well that can affect concurrency.

Capability	Description	Weight
Ordering capability	This capability offers the clients the ability to get results in a specific order, this also applies to writes. The items are usually adorned with a correlation token that will be used to process the items in a deterministic order.	1
Guideline	This feature is usually interesting when you need to save items in a particular order. Having said that this can be easily emulated by the clients as they can send the items in a specific order if there is any relationship involved, on the reading side, you can order on the client using technologies like LINQ. We recommend that the cache items should be independent from each other and this feature should be avoided.

Capability	Description	Weight
Async operations	Asynchronic operations will disconnect the relationship between the client and the server. This allows client to fire a request and continue working rather than waiting for a response from the cache. This usually involves two modalities, “fire and forget” where the client does not receive any notification or an “async notification”, where the client will receive an event once the operation has been completed.	4
Guideline	This feature is very useful if you are working on real time scenarios, where the fire and forget mode comes very handy. Caches should never be treated as the “only” source but rather a performance enhancement for the application therefore reading and writing from the cache helps but is not critical, this means that a fire and forget model works very well on those scenarios.

Capability	Description	Weight
Bulk operations	Bulk operations allow caches to load massive chunks of data at the same time (in certain cache technologies this also includes dumping the cache in a single operation). Usually this supports fast cache recoveries from faults, when other media is used to back up the cache.	3
Guideline	If you are treating the cache as critical then you will need a backup support, for example a file or a database. The bulk operations will allow you to “fast-load” your cache in a recovery event from the last good known state. This will also allow you to warm up the cache, making the cache more efficient.

Capability	Description	Weight
Object Query Language support	Object query language (OQL) is a SQL style language that allows you to search cache items in a database similar fashion, using SELECT, FROM, WHERE you can filter your cache items before retrieving them.	2
Guideline	A query language can help you to quickly filter cache items when you need several items at the same time. This model can search usually on cache items tags or attributes. This feature is useful if you store items individually rather than grouped (i.e. if you store all the orders on different slots instead of storing an “order” object with all the orders embedded). Note that you may need to know the OQL language beforehand or have some SQL syntax skills.

Capability	Description	Weight
LINQ support	LINQ is a Microsoft .NET technology that allows you to query objects based on a projection tree. This allows you to search cache items in a database similar fashion, using SELECT, FROM, JOIN, WHERE, ORDERBY you can filter your cache items before retrieving them.	2
Guideline	A query language can help you to quickly filter cache items when you need several items at the same time. This model can search usually on cache items tags or attributes and works very well with .NET solutions. This feature is useful if you store items individually rather than grouped (i.e. if you store all the orders on different slots instead of storing an “order” object with all the orders embedded). Note that you may need to know the LINQ language beforehand or have some SQL syntax skills.

Capability	Description	Weight
Parallelism support	Parallelism algorithms will allow the cache to take the most from multiple CPU cores, this can make retrieval operations faster as the cache can schedule filters on different thread queues. In certain caches, this feature is also used during write operations, when multiple writes are allowed or bulk inserts are processed.	4
Guideline	If the scenario involves queries (OQL, LINQ) or relationship retrievals then the parallelism capability becomes really important. As most servers will have more than one core this feature should be a must as will allow systems to scale up better. Note that it can also add complexity to the cache and as mentioned on the memory management this can be tricky for certain cache vendors unless they use proven frameworks like the PFX or OpenMP.

Capability	Description	Weight
Streaming	Streaming allows caches to send the cache items in chunks. This is usually available for large cache items that are not suitable for bulk transition (i.e. a video). The cache opens a channel with the client and sends the data in blocks, allowing other connections to process the requests without blocking the thread pool.	2
Guideline	This feature is only useful on specific scenarios where large data will be used, for example, music, videos or large images. The streaming will allow the client to receive data blocks that needs to be processed by the client. Only use streaming if you really have objects larger than 10Mb.

The global weight for this category is 2.75

Original post by Salvador Patuel on 01/02/11 here: https://blogs.msdn.com/b/salvapatuel/archive/2011/02/01/distributed-cache-capabilities-data-manipulation.aspx

Distributed Cache Capabilities - Data Manipulation

Additional resources