Moved from http://blogs.msdn.com/vijgang
What this blog describes is the steps you need to get SharePoint to crawl a Forms based Authentication site. What are the issues you may face and how to resolve them.
- Microsoft Office SharePoint Server 2007
- A forms based authentication enabled site (NOTE: This could be a SharePoint based FBA site too)
We start by downloading addrule.exe to the SharePoint server. This tool is available at SharePoint Server 2007 Tool: Add/Edit Crawl Rules with Form/Cookie Credentials.
We then create a XML file to feed the addrule.exe. The specification of the XML file is documented in Searching Sites Protected by Forms Authentication with Enterprise Search in SharePoint Server 2007 and even in SharePoint Server 2007 Tool: Add/Edit Crawl Rules with Form/Cookie Credentials.
I get this Sample XML file created.
<param name="login" public="true">Sign In</param>
<param name="__EVENTVALIDATION" public="true/wEWBQKLhuipCQLE96mtBQLLtsPBAgLkkP7MCgK/lZyyB9CK4YpD9xxOo46u87JbhTsQ5AkW</param>
I then create a content source that points to the http://fbasite URL.
Then I run the command "addrule.exe myfba.xml" command to create a crawl rule in the SharePoint SSP search settings.
Many standard FBA sites will work using these steps. But some might still fail. I found 2 reasons why this can happen and they are,
- We missed out few params that the FBA site expects.
- The param value that we specified in the XML file is not URL encoded.
The solution is the same for both issues. We need to use this wonderful tool - Fiddler that actually is a HTTP debugging proxy. This tool allows us to see the traffic between the client and server when using the HTTP protocol.
So the steps we take to fix this are,
- Install and start Fiddler.
- Browse to the http://fbasite site - the login page should show up.
- Enter the credentials to sign in.
- Ensure you were able to successfully login.
- Now switch to the Fiddler window and double click on the link in the Web Sessions window that points to the Login page.
- Then on the right hand page, select the "Session Inspector" tab and click on the "Raw" view.
- This is what you should see,
Here the section in RED is what is interesting. These are the parameters that are sent by the browser to log you in. This is exactly what is needed by SharePoint to login to the FBA site. If you look at the formatting of this text, it is something like this:
So if we copy each and every parameter in that string and its respective value to the addrule XML file, we should get SharePoint to login the way you logged in using the browser.
By copying all parameters we are resolving both the issue - that of missing params and also of the URL encoding.
NOTE: If the site is a SharePoint FBA site, then the recommendation is to extend and map the site to a NTLM site and then crawl the NTLM site. Prepare to crawl host-named sites that use forms authentication talks about this. But if you still would like to crawl the SharePoint site using the FBA credentials - then you need to make this adding configuration in the crawl rule.
- Browse to the Shared Services page -> Search Settings -> Crawl Rules
- Edit the crawl rule that was generated by the addrule.exe tool.
- Here select the check box for "Crawl SharePoint content as Http pages."
Now your SharePoint site will get crawled as a standard HTTP site using the FBA credentials, but note that you would miss a lot of SharePoint related functionality.
Note: If you are using Microsoft Search Server 2008, then you actually have UI that simplifies this process. There is an update planned to include the MSS features into MOSS. Once that's done, the UI should take care of finding the params to create the crawl rule.
Other reference articles:
- Searching Sites Protected by Forms Authentication with Enterprise Search in SharePoint Server 2007
- Prepare to crawl host-named sites that use forms authentication
- Sites that require forms-based authentication or cookie-based authentication are not crawled in SharePoint Server 2007
- Configure forms-based authentication (Office SharePoint Server)