SP2013: Understanding storage locations for files gathered by the Crawl Component

When gathering files from a content source, the SharePoint 2013 Crawl Component can be very I/O intensive process – locally writing all of the files it gathers from content repositories to its to temporary file paths and having them read by the Content Processing Component during document parsing. This post can help you understand where…


Crushing the 1-million-item-limit myth with .NET Search Connector [BDC]

Ever heard the one about not being able to crawl more than a million or two rows from a single source using SharePoint Business Connectivity Services (BCS)? In this post, I plan to dispel this myth and instead show that large crawls tend to fall over because of overly large enumerations. I then provide a strategy to overcome…


SharePoint Search *Quirks: Query Variables

In several forums, emails, and discussions, I keep seeing this recurring question, “How can I limit results to a specific library [in this site collection]?” Turns out, this was more difficult than I originally thought …until I found mention of the escape character “\” (essentially as a side note in an example) in this invaluable…


Problems Crawling the non-Default zone *Explained

In a nutshell, there is an undocumented assumption baked into SharePoint Search that the Default Public URL of a Web Application will be crawled. If you want everything to work auto-magically, crawl the Web Application’s Public URL for the Default zone (*Note: the crawler requires Windows Authentication [NTLM or Kerberos] in whatever zone your crawl…


Case Sensitivity and Duplicate URLs Getting Crawled

I’ve seen several scenarios where a single document gets crawled twice and leads to duplicate results for this particular item – two entries in the Crawl Log with the same display URL, but with different Doc IDs. This isn’t the typical scenario where multiple very similar documents get calculated as “Near Duplicate” items, but rather…


Cheat Sheet: Finding the *real Crawl State

Ever had a Crawl seeming stuck Starting, Stopping, Pausing, or even Crawling and thought… hmm, now what? Well, part of the problem is that only part of the Crawl state is shown in either the UI or PowerShell. Crawls actually have a sub-status as well the status shown in the UI that tells you more…


From SPC14: “Troubleshoot Search” session (spc375)

Wow! I wanted to send a huge thank you to the >500 folks that attended our session today and for the very gracious feedback. Here is a link to our session on Channel9: How to manage and troubleshoot Search: A practical guide (SPC375)https://channel9.msdn.com/Events/SharePoint-Conference/2014/SPC375   And as promised, here’s the PowerShell that I build for filtering the ULS by…


SharePoint Search and Deadlocks in SQL Server

Deadlocks reported in the Search databases, particularly the Crawl Store database (which manages the state of each crawled item and by is very I/O intensive), are not abnormal and can occur based on the concurrent and asynchronous nature of the Crawl processing (For additional information on the crawling process, see my previous post here). Below, I provide…


Shameless Self-Promotion: Presenting at SPC14

Coming to SharePoint Conference 2014? Then I invite you to come see Jon Waite and I present on Wednesday, March 5: How to manage and troubleshoot Search: A practical guide In this, we plan to be light on slides and heavy on live demos to troubleshoot hung crawls, identify feeding errors, and isolate query failures….