Synchronizing with Windows SharePoint Services, Part 2

Introduction

In my last blog post I introduced GetListItemChangesSinceToken and discussed how using GetListItemChangesSinceToken can make synchronization more efficient. In this post I'll talk some more about synchronization; take a quick look at GetList and UpdateListItems, and property bags. I'll finish up by discussing conflict detection, and performance best practices.

Other Web Services

In addition to GetListItemChangesSinceToken, Lists.asmx defines several other web services which allow clients to query for changes and make updates. There are several good examples in the Lists web service topic in the Windows SharePoint Services SDK.

GetList

GetList returns field schemas and other list properties.  Clients typically call this before syncing a new list, and then parse the response to match WSS fields to their client-side representation. 

If the field exists as an out-of-box site column, it can be matched by its field ID.  In other cases, the field's internal name can be referenced.

UpdateListItems

UpdateListItems adds, modifies, or deletes list items.  Clients typically call this to keep server items in sync with changes made on the client.

Request

public SoapXml.SoapXmlElement UpdateListItems(string listName, SoapXml.SoapXmlElement updates)

This is the format of the updates parameter:

 

<Batch [update options]>

   <Method ID="X" Cmd="CMD">

      <Field Name="InternalName">VALUE</Field>

   </Method>

</Batch>

 

A batch is a collection of methods, each of which specifies the following value for the Cmd attribute:

 

List of batch methods

Method

Description

New

Create a new item with the specified field values.

Update

Update the specified field values for an item.

Moderate

Change the moderation status for an item (used in the same manner as Update). The ModerationStatus field can only be changed by a Cmd=Moderate call. 

Delete

Delete the item with the following field values

Note   Setting ModerationStatus must be done using a Cmd=Moderate call. Attempting to set ModerationStatus using Cmd=Update or Cmd=New has no effect. For more information about ListItem, refer to the SDK.

 

The ID attribute of the Method tag is only used to correlate the method in the batch with the right item in the result.

 

Update Options

Option

Description

OnError="Continue"

Continue processing the batch if errors are encountered

LockSchema="TRUE"

ListVersion="N"

Only process the update if the list version matches

ViewName="VIEW'

Return columns present in this view with the item's data

RootFolder="FOLDERURL"

Perform the update in the context of this folder

Properties="TRUE"

Return the properties in the item property bag as separate fields

DateInUtc="TRUE"

The dates updated and returned are in UTC

 

Fields

In updates and deletes, the ID field needs to be supplied in order to identify the item.  If the update is being made on a document library, the FileRef field is also required to identify the document being updated.

When updating an item, only the changed fields need to be supplied.

When adding or changing certain types of items, certain fields may be required. 

 

Clients may update the property bag in two ways:

Update

Description

<Field Name="MetaInfo">

XXX

</Field>

Add or update the specified name/value pairs in the property bag

<Field Name=

"MetaInfo" Property=

"Name">

VALUE</Field>

Add or update this specific name/value pair in the property bag

 

Note   There is no way for a client to delete individual name/value pairs in the property bag. See Property Bag below for more information.

 

Response

The return value is:

<Results>.

   <Result ID="X,CMD">

      <ErrorCode>HR</ErrorCode>

      <ID>ID</ID>

      <z:row ... />

   </Result>

</Results>

 

Tag

Description

Error Code

0x00000000 if no error. A hex HR if there was an error.

ID

Not sure if, when and why this shows. Maybe only for deletes?

Z:row

For all commands except Delete, the updated item with all its fields is returned.

Property Bag

Property bags are mechanisms for developers to add their custom data to corresponding objects inside of SharePoint.  Property bags let developers transform simple lists into rich data stores, and webs into full applications. The property bag is a virtual container that can store almost any typed of value. The SPListItem Properties property returns a property bag for the specified object.

Note   If you use anything besides a String, int, or DateTime for the value, you will get a SPUnsupportedPropertyDataTypeException, with the message of "Only String, int, and DateTime datatypes can be used as the value in Properties."

A call to the Update method on the object persists the values set in the property bag. All values can be stored and retried by using Web service methods.

You must send the entire property bag, you cannot update just part of it.

For more information about property bags in Windows SharePoint Services, refer to https://msdn2.microsoft.com/en-us/library/ms480101.aspx.

Fields and Properties

SharePoint has very extensible list schemas.  A client that has fixed content types (calendars, tasks, etc) needs to be able to relate its own item properties to specific fields on the list.  Typically this is done by calling GetList to get the field schema before a client syncs to a SharePoint list for the first time.

Item properties in a client are best matched to fields on the server by the field GUID in the list schema.  The internal name of SharePoint fields can also be used to associate client-side properties with their server-side counterparts.  If a client cannot find an appropriate field on the server, it should store the data in the property bag for other clients to consume.  The client should cache these associations so that it doesn't have to fetch the list schema on every call.

GetListItemChangesSinceToken is a very fast call when no changes were done on the list. GetList is not quite as fast.

When a client calls GetListItemChangesSinceToken, it will receive a new list schema if the schema has changed. Unfortunately, if the schema change indicates a change in the internal name of one of the fields which are requested by name, the results of the call must be discarded. This is because the field you requested by name may not be included in the result set.

There are a few snags with respect to client use of the property bag.  If, when making an update, a client doesn't know which particular property has changed, it must update every property (even the ones not set) so that it is sure to clear the ones recently emptied.

Note   A field with an empty value is equivalent to an absent field; while an empty-value property is not.

If a new field is added to the list, values stored by the client in an equivalent property are not automatically promoted into the field value. The client must decide whether to respect the value in the property and the value in the field, possibly losing user data in the process.

Document Sync

Clients should use HTTP/DAV to sync document content. Although some SharePoint web methods support document content fetch and update (for attachments), this is not the preferred way of transferring binary document content. Because SOAP is based on XML, binary documents need to be encoded into an XML-compliant form. This is generally accomplished using hex encoding, which roughly doubles the bandwidth used.

A core field in document libraries (and also in generic lists in wssversion3short) is FileRef, which is basically a combination of two other fields associated with columns in the SQL table: DirName (server path to the containing folder) and LeafName (name of the document).

Document libraries also have two other fields computed from FileRef - ServerUrl and EncodedAbsUrl.

One of these can be used to reference a document.

On a generic list, another field (Attachments) is used to determine if the item has attachments.

Although there is a separate method to determine what these attachments are, by default there is no way to quickly determine if any attachments of an item have changed. This is what the IncludeAttachmentUrls and IncludeAttachmentVersion query options are for.

When these options are used, the value of this field should contain; #[AttachmentUrl];#[AttachmentGuid],[AttachmentVersion] for each attachment. The client can compare these values with what they stored to determine which attachments need to be re-fetched.

Because document contents can be quite big, a client can support a header-only mode that lets the user decide which document contents should be off-lined. MaxBulkDocumentSyncSize is a property that can be set on the server to guide the client when to automatically sync all contents.

Note    Some generic lists have content that can be quite large. Discussion Boards are an example. The content of a discussion item is stored in a couple of rich text fields. The header-only concept could also be used here by separate the call for the generic fields from a separate GetListItems or call for the contents. We have no examples of the best way doing this and we have not analyzed it for performance.

Conflict Detection

Besides performance considerations, discussed below, this is arguably the most important sync topic: How to detect and deal with an object that was modified on both sides.

Fetch-Send-Refresh

It is better to have conflicts detected and resolved by the client. The client can better detect that the changes didn't actually conflict, it can raise an alert for the user to manually correct the conflict, and it can store a copy of the changes applied by the local user in the user's storage.

Thus it is best if a sync operation consists first of fetching the changed data from the server, then detecting any conflicts with changes in the client copy, and finally uploading those changes if there is no conflict.

SharePoint objects may have some business logic applied at the point of the update. Because of that, the UpdateListItems method returns the updated values of all the fields and properties of the updated items.

owshiddenversion Field

wssversion3short uses this field to detect conflicts. If the field value is not supplied on update, the server will overwrite any changes. A client should always supply it on update to prevent data loss. This number should be whatever the server last sent.

This is used so that the server can tell if you're updating a stale copy of the item.  For example, a client syncs an item and gets a value of '2' for this attribute.  Someone changes the title of the item on the server, so the value increments to become 3.  When the client sends a change with value '2', the server complains because the item has been modified since the client last requested it.

When there is a conflict, the server will return a TP_E_VERSIONCONFLICT (0x81020015) error and the current contents of the item.

vti_versionhistory Property

The hidden version field is sufficient for simple conflict detection when all clients are synchronizing with a central server however; peer to peer synchronization presents further challenges. You want to avoid raising unnecessary conflicts when the change was synchronized by a peer client.

This situation may also happen in a non-peer-to-peer scenario. If a client successfully uploads a change to the server but does not receive an acknowledgment (response from UpdateListItems), the client needs a way to know that its changes were uploaded on next sync.

ETag DAV Header

Document and Attachment fetches and updates are done through HTTP/DAV. For that protocol, we have a separate mechanism for conflict detection. In every http get of a file (be it a document in a list, an attachment or a page outside a list) we return an ETag, which is supposed to be another blob that contains a guid and a version number. When uploading a document with http put, a client should request that the ETag matches the one supplied.

Note   Version history is not supported for this protocol.

Attachments

Although updating attachments can be done through HTTP/DAV, adding attachments requires using the AddAttachment method from the lists.asmx web service which takes a binary array and returns the URL of the attachment.

For more information about AddAttachment , refer to lists.asmx web service.

Performance

There are some performance issues to keep in mind when you are dealing with syncing with a server.

Latency

The amount of time it takes for an action to complete is what is important to a user. Also, there are usually limits on the amount of time allowed to process a request on a database, on a front end and by the entire request, so extremely long requests can turn into denied requests. Even despite this, you want to be able to give the users feedback on the action.  This is why paging is required.

Using the row limit property on GetListItemChangesSinceToken to limit the amount of data requested each time is crucial for the above reasons, but it should be clear that it will also increase the total amount of time to complete a sync process.

Throughput

Obviously, reducing the total amount of cycles required to process a request helps performance by reducing latency. However, with multiple clients, it is more important to reduce the adverse effects one client has on the others. Most of the time, it is easier, cleaner, safer and more effective to make the server do some processing than to implement the same processing on a multiple number of clients. However, to increase throughput, it is almost always better to do that work on the client. Although the server will likely be a lot more powerful, the client will likely have more available CPU time. A sync client should make the data request to the server be as little and simple as possible.

Bandwidth

We target high-bandwidth scenarios, but even then, it is important to try to minimize the amount of data sent across the wire.

It a client is not going to require a piece of information, it should avoid requesting it.

Paging

When performing a full sync (no change token), the client should request a maximum number of items returned per page using the rowLimit parameter. If the filtered number of items in the list is greater than that number, the server will return a ListItemCollectionPositionNext attribute to be used to request the next page.

We will only return the current change token of the list on the first page to prevent the loss of any changes being made to the first page. The client should store the change token from the first page for a subsequent incremental sync.

Secondary pages will also not include list and global properties like permissions, alternate urls and TTL.

 

rowLimit is also supported on incremental syncs (change token supplied), but on an incremental sync this will limit the processing of our internal change log, and we have an internal limit of 100. Although the client can be sure the number of items returned will never be greater than that limit, in certain circumstances not all changes may have been synchronized even if the number of items returned is smaller than the limit. This is because we will stop processing the change log as soon as we reach a number of updates equal to the limit. When that is the case, we return the MoreChanges attribute to indicate there are more changes in the change log. Instead of waiting for the next sync period, the client should request more changes immediately using the returned change token.

 

The limit works this way on incremental sync for a few reasons:

  • We have an internal limit of 100 because we don't have a modified time index we can use to filter the items returned and SQL has a limit of 160 on the number of ors in a query and starts performing badly close to that number. 100 gives us a potential extra 60 as part of the filter requested by the client.
  • We could have made several separate SQL queries, but that would imply supporting all ordering and filtering on the middle tier.

Because of this, we needed to return a change token that is not current, so that extra changes could be processed on a separate call. We could have still looked at the entire change log to better determine the latest point at which the number of items returned would be smaller than the limit, but even this wouldn't be accurate without filtering on the middle tier.

Filtering and Ordering

Filtering is a way to allow the user to only get a certain set of items in a list. The two most common usages of this are for folder sync, where the user only gets the items inside a folder, and for certain Group Board scenarios where the user only gets the items associated with him/her.

Filtering can be done using the contains parameter or the query parameter. Contains is more restrictive since it is basically the Where clause of a SharePoint CAML query, while query is the full query. Contains is safer to use because we can optimize certain scenarios. Query is more powerful and flexible, but the caller must understand its performance effects.

A client should avoid filtering by a non-indexed column. Otherwise, fetching a page will require a scan of the entire list until it finds the number of items requested.

A client should also avoid requesting an order unless the column of the order is indexed. Otherwise, fetching a page will at a minimum require a sort on the entire filtered dataset.

Finally, if the filter is not on the same indexed column of the order, then SQL may still scan the entire list to avoid sorting the filtered dataset.

An incremental sync has an implicit filter. We will request items with a certain ID. In this case, the client should never order by something other than ID. Filtering by something else is OK, for the dataset is restricted to a maximum of 100.

Filtering by folder can be done using the Folder query option, but the list should be ordered by the FileLeafRef. For a recursive query, it should first be ordered by FileDirRef as well.

There is also a way to filter by multiple folders using something like

"<Or><BeginsWith><FieldRef Name="FileRef"/><Value Type="Note">Shared Documents/folder1/</Value></BeginsWith><BeginsWith><FieldRef Name="FileRef"/><Value Type="Note">Shared Documents/folder2/</Value></BeginsWith></Or>".

This will synchronize the full contents of folder1 and folder2.

The client should use this in the contains parameter and add the following query option:

"<OptimizeFor>FolderUrls</OptimizeFor> "

This will make sure the SQL query is optimized appropriately by ordering it by FileDirRef, FileLeafRef and constraining the right columns.

 

Conclusion

The most efficient way to synchronize with Windows SharePoint Services is to download only those items that have changed since the last synchronization occurred. In wssversion3short this can be done by calling the GetListItemChangesSinceToken Web method.

GetListItemChangesSinceToken allows clients to track changes on a list.  Changes, including deleted items, are returned along with a token that represents the moment in time when those changes were requested. By including this token the next time you call GetListItemChangesSinceToken, the server looks for only those changes that have occurred since the token was generated.

We discussed considerations that the developer must keep in mind to obtain the best possible performance.

Acknowledgements

I would like to acknowledge the following persons for their gracious help in technical reviews for this article: Matt Swann (Microsoft Corporation), Bill Snead (Microsoft Corporation).

See Also

https://blogs.msdn.com/sharepointdeveloperdocs/archive/2008/01/21/synchronizing-with-windows-sharepoint-services-part-1.aspx

See Also