Content Migration in SharePoint (Post 2):

Key Concepts in Selective Migration 

Picking up where the first post left off, today's foray explores the basic concepts of SharePoint content migration. Most of these concepts come into play in nearly every migration scenario where you use the SharePoint.Deployment namespace APIs. This post covers the following concents:

  1. Content selection (also known as "cherry picking")
  2. The content migration package (.cmp) file
  3. Object identity (that is, retaining object identity on import)
  4. Reparenting objects
  5. Handling content types when migrating a site collection
  6. Handling workflows when migrating

 

     Content selection for export

Once the initial full migration has occurred and the source and destination sites are established, selective migration relies on logic in your custom code for selection criteria that determines which items are selected for migration, and when. Also known as "cherry picking," this is the process through which specific items on the source site collection are migrated to the destination site collection based on selection criteria that is specified in the custom code.

In the basic scenario, logic in a code module associated with a timer job runs on a recurring schedule (for example, every 24 hours at 3:00 a.m.); the module examines date stamps or change logs (or both), then selects and exports only those files that have changed since the last migration.

The operation relies on the identity of objects in the source and destination site collections. At import time, the destination recognizes objects by GUID identifiers and then updates their corresponding files accordingly.

     Content migration package (.cmp) file

The content migration package is a compressed cabinet (.cab) file that uses the .cmp file extension. The migration package (.cmp) file aggregates data for export from the source site, along with site structure and dependency data, and stores it as compressed, serialized XML files.

While the export operation compresses site data by default, you can override this behavior and export uncompressed files by changing the SPDeploymentSettings.FileCompression property to false. This property is on the SPDeploymentSettings object.

When using file compression, you must must specify the name of the compressed content migration (.cmp) file using the BaseFileName property on the SPExportSettings object. The FileLocation property specifies where the compressed file is located on the source server.

By default, the .cmp files are limited to 24 MB in size, although you can change this using the FileMaxSize property. When site data exceeds the 24 MB limit, your site data is separated in to two or more migration files. However, where a single file exceeds the maximum file size, the operation resizes the .cmp file to accommodate the file. You can have any number of .cmp files.

     Object identity

SharePoint objects are identified by GUID on the source server. Upon import, by default, the objects are assigned new GUIDs on import to the destination. To support selective migration scenarios, however, you must change the default and retain the identity of the object that you are exporting. This is very important, because the only way that files can successfully update their corresponding versions on the destination site is for the destination to recognize the object by its identifying GUID.

You can retain object identity by setting the RetainObjectIdentity property on the SPImportSettings object to true.

Note To use the RetainObjectIdentity property to support selective migrations, you must also set the ExportMethod property to the value ExportChanges.

Retaining object identity should be used carefully, and in limited scenarios. It most commonly supports a publishing scenario in which source and one or more destination servers are mirror images — for example, where a development or test server is migrating updates to one or more production servers.

If you are creating a new site that you know in advance will receive file updates on a regular basis (for example, from a development or test server), you can enable a smooth link between the two by first doing a full migration from the source to a completely blank site, but with RetainObjectIdentity property set to true. It is very important, then, that you never modifythe destination. The initial migration established the mirrored hierarchy and mappings; any modifications to the destination can break the linkage. If this relationship gets broken, your only recourse is to delete the destination and then reestablish the linkage as originally done.

Note In the scenario described above, you can use Stsadm.exe to complete the export portion of the migration, since it does not support the retention of object identity. However, you must use the migration APIs to complete the import portion of the migration in order to retain object identity.

SharePoint content databases do not permit duplicate GUIDs or file names, so you must be careful when implementing this property. You should not retain object identity when you are simply updating files on the destination server, because the import operation cannot determine whether a file is new or an update, and will therefore place a duplicate copy of the file on the destination.

This happens because the migration process uses GUID identifiers, but only down to the level of the item collection. Beneath that level, that is, for individual files, the system uses the file name. Consequently, the import simply copies over the updated version of the existing file and leaves the original intact, potentially creating duplicates. When the destination server then encounters duplicate file names, it cannot resolve the ambiguity and becomes unstable.

Important When deleting files, it is very important to delete the files on both the source and destination servers. Not doing so can cause duplicate files names and an unstable site.

To support field deletion (a feature that deletes previous versions of items on the source server when the migration updates and deletes the same file on the destination), the RetainObjectIdentity property must be set to true.

 

     Reparenting

Now let's consider a much different scenario. Instead of a publishing scenario, in which objects on the source server have to map to their equivalent files on the destination server, consider instead a scenario in which a specific subweb or list item is migrated and you want to place it at a different place in the destination hierarchy. This is a very common requirement and you handle this by reparenting the object (or objects) on import.

It's important to note that, as discussed in the section on retaining object identity, any time that you do not retain object identity, a migrated object will need a new parent specified on import. It doesn't matter if the destination site collection is the same site collection as the source or even if it is in the same database as the source site collection. It also doesn't matter if the destination site collection is in the same or a different farm as the source site collection.

If you export an item from a database without exporting its parent, then the item that you've exported will become orphaned in the content migration package, but that's not necessarily a problem. A package can contain multiple orphaned objects.

Having orphaned objects in the migration package is not a problem because the import method allows us to define a new parent for each of these orphaned objects. This is what is meant by "reparenting." However, objects that are exported with their parent objects keep their relationships during migration. So, for example, if you export Web A and Web B is a sub web of web A, then you can only change the parent of web A. The parent of Web B will remain as Web A, because Web B.

There are two ways to reparent orphaned objects when importing. In the first, you import all of your orphaned objects into the same subweb, while in the other method you assign parent objects individually to each orphaned object.

      Reparent by importing all orphans to the same subweb

You can reparent your orphan objects by simply importing all of them into the same subweb. However, this only works if all of the orphan objects are destined for the same subweb. For example, this is the preferred method if you have exported two or more list or document library objects and you want to import them all into the same subweb. This works even if the export objects are from different source sites. Here is some sample code for doing this. Of particular note are the WebUrl and RetainObjectIdentity properties.

 

SPImportSettings settings = new SPImportSettings();

settings.SiteUrl = "https://localhost:2001";

settings.WebUrl = "https://localhost:2001/MyDestinationWeb";

settings.FileLocation = @"c:\export";

settings.FileCompression = false;

settings.RetainObjectIdentity = false;

 

SPImport import = new SPImport(settings);

import.Run();

 

In the above example, the WebUrl property defines the site (the SPWeb) that becomes the new parent for all orphaned objects in the export package.

Note You cannot use this method to reparent orphaned objects if the migration package contains orphaned documents; SharePoint site cannot become the parent of a document document object (only list or folder objects can parent documents).

Note, too, that the RetainObjectIdentity property is set to false. This is important because keeping this property as false (which is the default) cases the operation to assign a new GUID to the orphaned objects, which is essential for reparenting those objects.

      Reparent by assigning new parent objects individually

Assigning new parents to orphaned objects individually is much more flexible than the previous method, but also requires more coding. In this approch, you must intercept the import operation to assign a new parent to each orphaned object after the operation has accumulated a list of all orphaned objects in the migration package. You can do this by implementing a custom event handler, as in the following example:

 

static void OnImportStarted(object sender, SPDeploymentEventArgs args)

{

   SPSite site = new SPSite("https://localhost:2001");

   SPWeb web = site.RootWeb;

 

   SPImportObjectCollection rootObjects = args.RootObjects;

   foreach (SPImportObject io in rootObjects)

   {

      io.TargetParentUrl = web.Url;

   }

 

   web.dispose();

   site.dispose();

}

 

This approach uses the RootObject collection of the event arguments to aggregate all and then reparent all of the orphan objects. Actually, we do much the same thing that we did when we imported all of the objects to the same subweb, that is, we defined a specific site (SPWeb) as the new parent of all included orphaned objects. However, notice how you can extend the logic in the event handler to, for example, assign different parents to different orphans based on object type, as shown here:

 

static void OnImportStarted(object sender, SPDeploymentEventArgs args)

{

   SPSite site = new SPSite("https://localhost:2001");

   SPWeb web = site.RootWeb;

   SPList list = web.Lists["MyDocLib"];

 

   SPImportObjectCollection rootObjects = args.RootObjects;

   foreach (SPImportObject io in rootObjects)

   {

      if (io.Type == SPDeploymentObjectType.ListItem)

      {

         io.TargetParentUrl = list.RootFolder.ServerRelativeUrl;

      }

      if (io.Type == SPDeploymentObjectType.List)

      {

         io.TargetParentUrl = web.Url;

      }

      ...

   }

 

   web.dispose();

   site.dispose();

}

 

There is a great deal of flexibility here. In addition to reparenting based on the object type, you could instead look at the original TargetParentUrl, for example, to obtain the location of the source and then include that as part of your reparenting logic.

The code example below shows us how to hook up the event handler to the import operation. You can do this as follows:

 

 

static void ImportDocLibItem()

{

   SPImportSettings settings = new SPImportSettings();

   settings.SiteUrl = "https://localhost:2001";

   settings.FileLocation = @"c:\deployment5";

   settings.FileCompression = false;

   settings.RetainObjectIdentity = false;

 

   SPImport import = new SPImport(settings);

 

   EventHandler<SPDeploymentEventArgs> eventHandler = new EventHandler<SPDeploymentEventArgs>(OnImportStarted);

   import.Started += eventHandler;

 

   import.Run();

}

 

Notice that you need to register the event handler with the import class before you start the import operation.

 

There are several more event handlers that you can register and use during import; these are explained in more details at help topics for SPImportEvents and for SPExportEvents. Similar events can also be registered for export, if required:

 

 

     Content types (handling)

Migration operations that use the export and import commands include your content type definition schema files in the export package. However, in cases where content type definitions hold references to custom SharePoint features, you must manually install and configure on the destination compute the feature for which the content type holds a reference. Features that are native to the SharePoint server environment will have their type definition references restored automatically.

On starting the import operation, you will receive a warning message that alerts you with the identity of custom features for which references must be recreated. You must resolve these references after completing the import.

For more information about content types, see Content Types. For more information about SharePoint features, see Working with Features.

 

     Workflows

Workflows are not included in export packages so when you initially set up a migration target, you must copy any custom workflows that you have onto the destination server and reconfigure the workflow association table. This includes creating (or restoring on the destination) the workflow definitions files.

Of course, manually restoring your custom workflows on the migration destination is only necessary when you initially set up your destination server. Subsequently, when doing selective migrations in a publishing scenario, it is advantageous that workflows are excluded from the migration package.

For more information about managing workflows, see the following:

· Introduction to Workflows in Windows SharePoint Services

· Workflows in Windows SharePoint Services