MOSS Search: How to control content to be crawled..

Article
08/31/2007

Today morning I was answering a DL question. Question was..."How to I control what Search content to be crawled? There are some paths URLs which I do not want to crawl?.

I was wondering if there is a way to configure MOSS Search to exclude the path and library names in the search result?

YES! You can easily control content to be crawled using following techique.

Create a new page INDEX.HTML and use Index.html page to control what needs to be crawled. Details ….

1. Create a new path with only one page in it (Index.html)

2. Add all the paths URLs you want to crawl to Index.html page. [Do not include paths you do not want to crawl]

3. Use Index.html page URL to define Content Catalog. specify newly created Path where Index file is (Say https://MyServer/Search/Index.html)

- Content Source Type (select Web Sites radio button)
- Start Adderess (Type the Index.HTML page URL)

4. In crawler setting use custom settings to control server hop and page depth

- Choose Radio button option "Custom- specify page depth and server hops"
- Use Limit Page Depth option and Limit Server Hops options to control content to be crawled

This way you have full control on content to be crawled.

MOSS Search: How to control content to be crawled..

Additional resources