Indexing MSCMS 2001 using MOSS 2007 indexer?

This might be a cake-walk for most of you, but I got delayed a bit to get MSCMS 2001 contents indexed using MOSS 2007.  Basically what I saw after I setup a content source and started a full crawl was the below warning:

The specified address was excluded from the index. The crawl rules may have to be modified to include this address. (The item was deleted because it was either not found or the crawler was denied access to it.)

I verified the content access account specified in MOSS is also a CMS 2001 administrator.  And I can browse to the default channel in CMS 2001 from the MOSS server.  But yet, the above error!  After trial-n-error, I was able to figure out that the anonymous access that’s enabled in CMS 2001 server was causing this issue.

With anonymous access enabled at MSCMS 2001 web site: when I browse to the CMS 2001 site via IE, I see a prompt to provide my domain credentials.  This essentially means that even though anonymous access is enabled CMS content themselves is not anonymous and that it has to look for windows credentials to authenticate the user.

With anonymous access disabled at MSCMS 2001 web site: when I browse to the CMS 2001 site via IE, I get to see the home page under default channel directly.  I login using a domain account that’s a CMS 2001 administrator.

What I think is happening here is MOSS requests the URL provided and supplies the content access account, which is going to be a domain credential.  When this request hits the IIS web site, IIS – as the first guy to shake-hands, compares the credentials with anonymous user and tells MOSS indexer that it could not serve the request.

That’s it! I simply removed anonymous access from the CMS 2001 web site and MOSS indexer started to index the content.  So, it’s probably a good idea to make sure that anonymous access is switched off, when indexing CMS 2001 using MOSS 2007.  Cheers!

Comments (1)

  1. Louis says:

    When indexing CMS2001 did you get a lot of "noise" i.e. all content was being indexed including navigation items, static content etc.

    If so how did you work around this?