Proof of Concept - Getting a Feel for the Terrain

Ah! A new year (yes, I know if it January 26, but I started writing this on the 6th)!

Starting with this posting I am going to discuss implementing a basic evaluation of Fast Search Server 2010 for SharePoint that can be used as a guideline for POCs or entry level learning.

A basic test would normally consist of:

  • Installing SharePoint and Fast
  • Locating some content to index
  • Indexing the content
  • Executing a query to return some content
  • Repeat with one or more additional content sources

The above list translates into:

  • Download and run the TechNet VHD on Windows 7
  • Set up a shared folder containing documents to be indexed
  • Index some documents located somewhere (on the file system, a web site, etc.)
    • Locate content
    • Configure the connector
    • Index the content
  • Execute a query to return some content
    • Set up the Fast Search Center
    • Execute a query on the existing index
  • Repeat with one or more additional content sources
    • Web sites
    • SharePoint
    • Azure
  • Add an entity extractor
  • Add custom code to the pipeline extensibility stage
    • Regular expressions
    • Business logic
    • Cleaning up existing properties
    • Add additional properties
  • Add synonyms
  • Add Visual Best Bets
  • Indexing XML files
    • With and without embedded documents
  • Indexing structure content
    • With and without embedded documents

Over time we will:

  • Download/extract the TechNet VHD
  • Deploy the VHD to a VM through VirtualBox (Hyper-V on W7 does not support 64-bit guests)
  • Map a folder with content on our local machine to a shared folder on the VM
  • Configure the ContentSSA to crawl a folder, and any sub-folders, and index them in Fast
  • Configure the query to ignore folders indexed from a file system
  • Map crawled properties to managed properties
  • Index documents stored in the local SharePoint repository
  • Index documents from a web site
  • Add synonyms (one at a time and in bulk)
  • Create entity extractors
  • Create Visual Best Bets
  • Automate as much as possible using PowerShell
  • Pipeline extensibility stage: regular expressions and business logic
  • Index XML files
  • Index a database using the JDBC connector and BCS
  • External client connectivity to Fast

We will try to cover as much functionality as possible in byte-sized chunks to gain the understanding needed to really take advantage of the FS4SP platform.

Remember: we might not ever get to complete the first round of tutorials much less the additional pieces, but we will do what we can. And I will be doing these in whatever order happens to be convenient or not at all. Just sayin'.

What we are not going to do:

  • Index millions of documents (though that is a thought)
  • Connect to Documentum, Lotus Notes or Exchange
  • Gnarly connector configurations (this is to get everyone started, remember?)
  • Build a truly snazzy search-driven application (maybe next time)

With all of the above having been said I will also say that there will not be a lot of in-depth explanations about how a particular features works or the whys and wherefores of using it. I will endeavor to list links to public documentation that discusses the given area in question and we will move on.

This will be so barebones that we will run the VHD on Windows 7 (64-bit). This is to make it possible for many of you to experience Fast without having to install Windows Server 2008 R2 to accomplish that. We will be using the 64-bit version of W7 to insure that you can allocate enough RAM to the VM.

My hardware:

  • HP EliteBook 8540w
    • Quad Core
    • 16G RAM
    • 320G hard drive with 158G free (not required, but more space is always better than less)

My software:

  • Windows 7 Enterprise with all of the latest updates
  • VirtualBox 4.1.8 (as updates become available I will be upgrading as well)
  • Random documents taken off the web

Ideally, if you have a quad-core notebook with 16G of RAM then you should be all set.

Warning: you will be responsible for downloading and installing VirtualBox located here. I will not cover how to install VirtualBox, but I will cover how to set up the VHD so that it becomes the boot drive for a VirtualBox VM and what the minimum configuration is that you can get away with to perform this. Don't worry: if I can run this on a quad-core machine with 16G so can you (we'll just be judicious in our use of resources...or use another notebook for note taking).

The Story So Far

POC - Part 1: Creating the VM - Extracting the VHD

POC - Part 2: Creating the VM - Deploying the VHD Using VirtualBox

POC - Part 1: Search - Mapping the Content Folder

POC - Part 2: Search - Configuring the connector/Indexing Content

    POC - Part 2a: Ignoring Indexed Folders

    POC - Part 2b: PowerShell: Configuring the connector/Indexing Content

Next time: Let's go get the VHD!