I ran into a pretty unique situation with crawl rules and search. Here’s what happened…
- I configured crawl rules to ensure that the crawler does not crawl the internet.
- I started by adding an inclusion of *://*.somecompany.com/* to include all urls under the company domain
- I then added an exclusion of *://*.* to ensure that the crawler does not crawler anyothe domain unintentionally, sure this add more management hastle under the crawl rules but will ensure you don’t cause other companies alot of pain.
- Next i started a crawl and encountered that all of the user profiles were return “content excluded because of no-index attribute”. this should not have happened though because i can watch the crawl work as soon as i remove the crawl rules.
After lots of internal emails, banging my head against the desk and etc… i decided that it had to be simpler. Luckily i had server access and noticed that when i fully qualify urls ie stops sending an NTLM authentication ticket. You can get around this by adding the url to your intranet zone. I then went into the content source and noticed all the urls were fqdns. I then used the local names (took off the domain suffix) and voila everything works with the crawl rules. Why were the crawl rules causing the crawler not send the ntlm ticket or whatever else the issue was i have no idea. All i do know is nothing was in the event log and the trace log even on verbose. The long term solution will be to add the proper registry keys as detailed in this support request, http://support.microsoft.com/kb/303650/en-us.
Hope i saved someone some from a few days of headaches.