MOSS Search: How to control content to be crawled..


Today morning I was answering a DL question. Question was..."How to I control what Search content to be crawled? There are some paths URLs which I do not want to crawl?.


I was wondering if there is a way to configure MOSS Search to exclude the path and library names in the search result?



YES! You can easily control content to be crawled using following techique.


Create a new page INDEX.HTML and use Index.html page to control what needs to be crawled. Details ….


 


1.     Create a new path with only one page in it (Index.html)


2.     Add all the paths URLs you want to crawl to Index.html page. [Do not include paths you do not want to crawl]


3.     Use Index.html page URL to define Content Catalog.  specify newly created Path where Index file is (Say http://MyServer/Search/Index.html)





    • Content Source Type (select Web Sites radio button) 


    • Start Adderess (Type the Index.HTML page URL)



4.     In crawler setting use custom settings to control server hop and page depth





    • Choose Radio button option "Custom- specify page depth and server hops"


    • Use Limit Page Depth option and Limit Server Hops options to control content to be crawled

This way you have full control on content to be crawled.

Comments (0)

Skip to main content