Make sure you test your Content Deployment after installing SP1...

I have been testing WSS/MOSS Service Pack 1 on a large-scale WCM portal for the past couple weeks and I ran into some issues with Content Deployment.  First of all, I do not want to say there's necessarily a widespread bug with Content Deployment and SP1.  Alright, you are warned :)  My goal here is that I do indeed have an issue and we are trying to go through support to eventually have some help from the product group to enlighten us.  Since I have this issue and that I do not see anyone opening support cases for the same thing nor people writing about it on the web, I'm assuming that it's working in most scenarios.

 

In summary, make sure you test your Full Content Deployments after you install SP1 in your staging environment.

 

Now that it's said, let's see why I'm thinking we have an issue.  First of all, we experience much more often the "Timed out" issue when using the Content Deployment issue.  Sometimes, it keeps going, sometimes, it's not.  The environment we are testing currently has been running a SINGLE job of Content Deployment for the last 8, yes EIGHT, days.  Since it's importing, the cancel button's not available and we cannot kill the job since it's running.  What are we importing?  about 100 MB of content with 21,000 objects.  It stopped doing anything significant after around 5400 objects.  While I would try rebooting the server or playing with a few jobs, I am purposely leaving the server as-is so that the support engineer might be able to realize that there's an issue.

 

While we had that issue, the customer (obviously) is asking me if we should install SP1 in production and would we have the same issue.  The staging environment is a little bit different (barely) where it's running in a VMWare virtual environment.  It has the same content as the production (we did a manual export/import from production to staging).  The STSADM extension that limits versions (described here) hasn't been ran in staging (it was thought that the export was done after it was ran in production but actually wasn't so we had multiple versions of each pages/documents).  So I started working on those 2 differences to see if they had an impact.

 

I did the test with 2 machines:

  • My development virtual environment running in a VMWare with 2 GB RAM.  The SQL server is centralized on a physical killer box and it's running smoothly.
  • My personal Virtual PC on my laptop (dual-core centrino with 3 GB RAM but 1.7 RAM allocated to the VPC).

 

Since Content Deployment usually takes longer than "import" (and I didn't want to export everytime) since it's often "freezing" on my VMs since SP1, I decided to test with export/import.  SO I did an export of a "vanilla site" similar to the production (with much less content) that is about 100 MB (actually the CAB file is compressed at 17MB) and 11,000 objects.  I deleted the destination collection on both environments and started the imports.

  • on VMWare, the first full import took 1h15min
  • on VPC, the first full import took 45min

So far, not too bad although I would be curious as to why it's taking THAT long for so few elements.  I was a bit surprise on the VMWare where it took much longer than my VPC while I was still working with Outlook & IE & Word at the same time.  VMWare 0, VPC 1 :)

 

Right after that, I started a 2nd import of the same "export" file again:

  • 3h30 minutes on the VMWare
  • 2h15 minutes on VPC

this is a bit depressing.  I have the same content re-imported and it's taking a much longer time.  While I understand it's checking the file ... in most content deployment scenarios, we don't really care, we simply want the same thing at the destination.

 

The 3rd time I ran it, I had the following:

  • 4h on VMWare, 3h on VPC

it's going up again!  The only differences this time is that there was 2 versions and it's adding a 3rd version!!!

 

I did some SQL profiling and couldn't see any request taking over 40ms (and there was very few of these).  I was only checking queries & stored procedures ... and I had over 100,000 lines in less than 5 minutes.  While everything seems to be running fine in Network Monitor on all environments, (CPU, disks, memory, network are all green), it's simply not going fast.  Maybe it's the sheer amount of queries that are running, I don't know and I do not have answers from the product group yet, but I'm getting more concerned with the feature now.  Unfortunately, I haven't run the same type of tests before SP1 so I don't know how bad it was, but I know it wasn't going up exponentially.

 

Last, I noticed that it "freezes" every time, at least on the VMWare development boxes (I have over 12) I have.  I wondered if it was any customization we'd done so I created an OOB publishing site with barely anything (not even variations which are known to add LOTS of objects in Content Deployment (i.e.: a new site can take 140 objects instead of 3).  I have about 1,100 objects and it takes 20+ minutes for a few MBs!!!!  The export & transport phases are fast, but the import is really slow.  Unfortunately, not only it's slow, it also stops at times.  I simply do not see much SQL traffic going on at those times and the objects imported aren't going up.  It might do that for 10 minutes and start again!

 

Hopefully I'll have better news in the future regarding this.  Until then, I'm hoping it's a problem with my environments only or maybe a VMWare issue that we now have.  Just make sure you test with your staging environment before production ...

 

Maxime