Digging the content index

The content index is one of the most sought after parts of Sharepoint. Search is one of the most popular and also one of the most mystical of Sharepoint features. So it goes without saying that accessing the content index is like chasing a dream. But before you start jumping with glee, there is absolutly no way of directly accessing the content index. Though the object model does expose a certain surface area of the content index. I can infact point out that the content index is actually on the filesystem on the Sharepoint server machine. But thats about it.

Lets have a look a what the object model has in store for us. There are a couple of namespaces that spring forth from the Microsoft.Sharepoint.Portal.Admin.Search.dll that can help us scratch the surface of the content index.

The objects that are of interest are SearchSite, SearchCatalog and SearchContentSource.

using Microsoft.Sharepoint.Portal.Admin.Search;

      //the using directive to get the above objects

       SPSite site = new SPSite("https://localhost");

  SPWeb web = site.OpenWeb();

  SearchCatalog catalog;

       SearchCatalog anothercatalog;

  SearchContentSource contentsource;

       System.Collections.IEnumerator anEnumerator;

       System.Collections.IEnumerator anotherEnumerator;

       System.Collections.IEnumerator yetanotherEnumerator;

       

       SearchSite srchSite = new SearchSite();

       srchSite.Connect("blr2r02-12",site.ID.ToString());

       // takes <server_name>,site GUID

// this is gonna give you all the catalog names -

// btw you have to use the IEnumerator, no way out of it :-)

anEnumerator = srchSite.Catalogs.GetEnumerator();

 

while(anEnumerator!=null)

{

anEnumerator.MoveNext();

catalog = (SearchCatalog)anEnumerator.Current;

// here you can write out the catalog name using catalog.Name

WriteToFile(catalog.Name);

// WriteToFile is a custom function to output a string to a file

// lets try to write out all the content sources that are part of each catalog

anothercatalog=srchSite.getCatalog(catalog.Name);

// get reference of a particular catalog

 

// this is gonna give you all the content sources -

// again no escape form the IEnumerator

anotherEnumerator = anotherCatalog.ContentSources.GetEnumerator();

while(anotherEnumerator != null)

{

anotherEnumerator.MoveNext();

contentsource = (SearchContentSource)anotherEnumerator.Current;

// here you can get various properties like

// depth, display name, source group, URL, etc

WriteToFile(contentsource.Depth.ToString());

WriteToFile(contentsource.DisplayName.ToString());

WriteToFile(contentsource.SourceGroup.ToString());

WriteToFile(contentsource.Url);

// Now this is where we run into issues. BIG issues.

// There is no object returned by any of the methods or

// properties of the SearchContentSource class.

// All the return types are base data types. What does this

// translate to? This basically means that this is the end of the road.

// This is all the level of detail

// that you can get from the object model.

// This thought not much is still quite substantial to increase

// our information of the content sources.

}

}

So now what are the resources that can be helpful here? Lets see, the standard Sharepoint Portal Server SDK is a big help.

If you dont already have it, heres where you can get a copy all for yourself. The Sharepoint Portal Server SDK

And heres where you can get the Windows Sharepoint Services SDK

Now, what about the class tree? so here is the PortalAdminSearch namespace

 

 

Happy hunting

 

/H