Recover Documents from MOSS 2007 Database

Introduction

A couple years ago I wrote a prior post on how a simple VBS script can be used to extract a document from a SharePoint 2003/WSS 2.0 database (Recover Documents from SharePoint 2003 Database). After seeing the traffic routing to the post and with the adoption rate of MOSS 2007 it looks like an update is necessary.

This approach obviates the need to restore a content database into a MOSS 2007 environment if the intention is to extract a few critical documents.

Disclaimer

A few disclaimers are necessary. This script goes directly against a MOSS 2007 content database which is generally discouraged. Any code using the database directly will not be supported by MS Product Support Services. Instead, the SharePoint and WSS APIs are the way to go. If the database is modified directly rather than through the published SharePoint and WSS APIs Product Support Services cannot properly troubleshoot any unexpected issues. This script reads from the database so we're safe from errant modifications. However, the data structure the script queries could change in a future service pack.

Script

With that out of the way, let's proceed.  

The script queries the dbo.AllDocs table which contains the application documents and retrieves the most current version based on the document name which is then streamed out as binary data to a file:

Dim contentDatabase
Dim leaf
Dim outputPath

server = "[SERVERNAME]"
contentDatabase = "[CONTENTDATABASE]"
leaf = "[LEAFNODE]"
outputPath = "[OUTPUTPATH]"

ExtractDoc server, contentDatabase, leaf, outputPath

Sub ExtractDoc(server, contentDatabase, leaf, outputPath)

  Dim conStr, selectStr

  conStr = "Provider=SQLOLEDB;data Source=" + server + ";Initial Catalog=" + contentDatabase + ";Trusted_Connection=yes"

  selectStr = "SELECT dbo.AllDocStreams.Content FROM dbo.AllDocs "
  selectStr = selectStr + "INNER JOIN dbo.AllDocStreams "
  selectStr = selectStr + "  ON dbo.AllDocs.ID= dbo.AllDocStreams.ID "
  selectStr = selectStr + " AND dbo.AllDocs.Level = dbo.AllDocStreams.Level "
  selectStr = selectStr + " where LeafName='" + leaf +"' AND IsCurrentVersion=1"

  Set cn = CreateObject("ADODB.Connection")
  Set rs = CreateObject("ADODB.Recordset")
  cn.Open conStr
  Set rs = cn.Execute(selectStr)
  Set mstream = CreateObject("ADODB.Stream")
  mstream.Type = 1
  mstream.Open
  mstream.Write rs.Fields("Content").Value
  mstream.SaveToFile outputPath, 2
  rs.Close
  cn.Close
End Sub

Copy this code into Notepad and replace [SERVERNAME], [CONTENTDATABASE], [LEAFNODE] and [OUTPUTPATH] with appropriate values. Save this file as a VBS script and execute from the command line as:

C:\>CSCRIPT ExtractDoc.vbs

The SQL Query is a bit more complicated than the SharePoint 2003 version. It joins the dbo.AllDocs table with the dbo.AllDocStreams table which actually contains the blob Content field. There is also a dbo.AllDocVersions table, however, with versioning enabled this table does not appear to be updated as new versions are added. With each new version a new row is added to both the dbo.AllDocs and dbo.AllDocStreams tables. Conveniently, there is an IsCurrentVersion boolean field in the dbo.AllDocs table. The join between the dbo.AllDocs and dbo.AllDocStreams is done between the mutual uniqueidentifier ID fields and a Level field which appears to increment with each new version.

The LeafNode is the name of the file to retrieve. This sample script assumes that the document is in the root of the containing document library. If it were in a subdirectory an addtional DirName would need to be used in the query and passed as a parameter.

NOTE: This was tested with a MOSS 2007 content database. This was not tested with a WSS 3.0 content database, however, I expect the schema is the same. 

This is certainly not something to use in a production environment where automated document retrieval must be a repeatable and reliable process. But it is a quick and dirty means of extracting a document from a restored database that spares you from the overhead of restoring a SharePoint/WSS environment.

This was written using a simple VBS script so that production support folks can use it easily without having to compile a .NET assembly.

References

How to recover SharePoint document once deleted from recycle bin - He talks about using textcopy, which is part of the SQL Server 2000 Resource Kit, to perform the same task. In addition he has a number of screenshots and additional instructions which may prove helpful. The blog entry mentions that, although textcopy is not officially supported for use with SQL Server 2005, it does work. The kit is available for download if you are a MSDN subscriber. Otherwise, it's included in the book,