After my article on Dealing with Concurrency: Designing Interaction Between Services and Their Agents (http://msdn.microsoft.com/library/en-us/dnbda/html/concurev4M.asp) and my talks at the TechEd in
Hearing people talking about CRUD makes me cringe; it makes me want to react to it. So, let me get that off my chest first.
The reason for my allergic reaction may stem from the way I see developers use Datasets the wrong way. Such a developer will write a service that gets you a Dataset, the UI modifies the Dataset and sends it back. The service or even the stored-procedure checks the before and after image, the timestamp or the version … ouch!
Why not have a request specifically stating what the request entails and specifies when the update should be executed or refused? E.g. buy this when the price is at most x, rather than buy this when the version of the information retrieved in an earlier request was 30256917 or worse: the old address was …, the new address is … Did the customer move? Then we better start a complete process. Or did we just correct a typo in the contact information?
Problems with CRUD are that CRUD
- Looses the context of the change (e.g. the contact info changed because the contact has another function in the organization or the contact info changed because the department was renamed)
- Implies dependencies you may not want (e.g. the sales system refused the order because the price is not the same anymore, even though it has been reduced)
- Does not state dependencies you may need (e.g. only place the order if the time to deliver is less than 10 work days)
Yet a lot of solutions have been built using the principle of CRUD and at least some of them were very successful. Why did they choose to use CRUD?
… Need to discuss using standard tools for replication …
This seems very reasonable. So why can’t we use CRUD or can we? Let’s compare the CRUD based service interfaces with a few other cases.
In object oriented design a class typically has a set of properties. These properties are accessors to the private data. Properties are typically used to keep the internal state consistent; they are not used to trigger complex business logic. These properties are used by various levels in the design. However not all of the set methods on these properties are declared as public. Only those properties that are without side effect when changed are typically made public. The other ones will be private.
For instance the Customer class in an insurance application may have an address property, but changing the address may result in an extremely complicated business process. Thus this property will typically be private, so the change can only be done after all the side effects have been dealt with. For an address change, the designer would typically offer a method, which internally will use the private property.
Similarly CRUD implies change of properties (multiple properties at once). Property changes that have side effects, i.e. effects that transcend keeping internal state consistent should not be supported (be prohibited). For these, requests should be defined that convey the business meaning of those requests as well as the conditions under which they should be executed.
In working with databases CRUD is the normal way of dealing with data. But there are a few notable differences between talking to a database and talking to a service.
- Data in the database is typically normalized. Properties aren’t replicated; a single update typically suffices to maintain consistency. But, I don’t think it’s a major difference if we look at a single service. Good entity design should help and a service could (and should) provide the logic to keep the data consistent. It becomes different if the same information is held in multiple services; then logic external to these services has to keep the services in sync.
- Database updates are transactional. This means that multiple updates that are, at least from the database’s view, unrelated can be combined in a single all or nothing transaction. When one of them fails, all fail. Multiple updates to the service are not combined; logic external to the service has to maintain consistency. If the service only provides a CRUD interface, a business action that updates multiple entities must be split into multiple updates, each on a single entity and the service will not offer transactional consistency between them, the caller has that responsibility. But, in replication scenarios, the caller does not take that responsibility either.
- Updates to the database are not protected by business logic. That protecting business logic does the database updates and doesn’t require additional logic to be executed, much like the properties of a class in object oriented design.
I think all three points lead to the same conclusion. Services are not databases, but by only providing CRUD interfaces, they offer a poorer form of database like behavior. If services should offer business actions and business logic, they should be given the information needed to execute such business actions and ensure that business logic. The service encapsulates entities and it protects consistency through business logic. If the original business request affects multiple entities, a service can only provide complete service if enough information about the original request is given to the service. Then it can execute business logic, apply that business logic even across multiple entities and maintain both information consistency and transactional consistency.
Now, this may all be true, but is it important? As is always the case, sometimes it is and sometimes it isn’t. Architecture and design is all about making the trade offs. So, when is it important to expose business actions instead of CRUD?
- If it is important to maintain consistency or if there is a reasonable chance of concurrency issues and if it is not easy to resolve those concurrency issues try to use business actions and avoid CRUD. Booking a travel where it is important to have a flight, hotel and car or booking an order where both products and service are required at a specific time are examples of requests that require consistency. The simplest examples of concurrency issues are the general ledger and the stock on hand information; they are hardly ever, if at all, exposed through CRUD. Not because of dependencies, but because of the concurrent nature of updates.
- If, on the other hand, updates are seldom or are only done by a single person, or, if the updates consist of adding information rather than changing existing information there is not much of a problem. The contact information in a CRM system will not be changed often. The appointments in my calendar are typically only changed by me. Even if our admin has complete change rights to my calendar, the chance of both of us changing the same appointment is slim, because of the common business practice that she just does not change my calendar unless she has to. In many cases, concurrency issues are guarded better by business practice or habit, policies and process than by the software. The software should support these and possibly enforce these, but without them, the software enforcement makes little sense.
At Microsoft, the people in the field use only a subset of what the CRM system has to offer for most of their work. They keep their contacts up to date. They add opportunities and activities and keep these up to date as well and they read the customer information. What if we were to provide this to them via Outlook? Outlook clearly only supports replication, so that all changes made to local data would be replicated to the server and into the CRM service. Every context other than the new entities and the changes to the existing entities would be lost. Still, I believe it would be a good, even recommended, solution.
Customer information is provided in Outlook but cannot be changed through Outlook.
Creating and updating of contacts is not critical. Contact information is shared, but when the contact information is changed, it is a minor change without much business logic. When an update fails the user may receive a synchronization error and deal with it is he or she deems appropriate. The chance of such a failure would typically be small.
Creation of an opportunity triggers a business process or workflow. But the creation of the opportunity in Outlook provides enough information to that process and this creation does not depend on any other changes. Even if these other changes are related, the opportunity does not have a “transactional” dependency.
Adding activities is uncritical. Changing activities might lead to conflicts, but much like appointments, activities can be though of as personal information. i.e. the owner of the information is known and that ownership is respected by common practice. The activity may be created by one person and then owned by another, but that is normally arranged outside of the software.
More complex cases don’t have to be offered through Outlook. The original solution provides these cases. Most users, and especially occasional users, will have sufficient functionality through Outlook and can hook up to the corporate network in the seldom case they need more; more demanding users can still use the original solution and have more overhead, but also more functionality.
Until here I’ve discussed CRUD as an all or nothing approach. It doesn’t have to be that way. That would only be the case if you were to rely entirely on standard replication mechanisms. A good way of combining approaches is to replicate information from the service and to queue requests to the service. (You can optionally change the local data to reflect those requests, but if those changes are ignored and overwritten after the requests have been processed by the service these local changes don’t affect the solution.) The business actions in the queue will then be processed and these actions can follow all recommendations made in “Dealing with Concurrency”. Yes, you may argue that every request is an entity too and that adding a request to the queue is CRUD too. I will not participate, if you call that CRUD and that’s how you want to use CRUD; great. You use replication as a messaging mechanism. Processing the replicated requests is exactly the same as processing requests to the service.
Suppose you want to write a solution for a service technician. The technician needs to fill out hours on the road, hours spent servicing, diagnosing and repairing. He or she needs to specify the material used, specify the condition of the devices, etc. The information about the customers, the devices, the maintenance contract per device, the prices for the materials, all can be replicated to the notebook. However the per customer service information that is entered consists of more than just an invoice; it can be a complex set of requests to the service that may contain a few problems on synchronization that need to be resolved before the complete set is accepted.
I need to say something on services encapsulating entities and exposing views and actions. So, more to follow…