Insights into MS IFilter Testing Strategy.


Ever since I started dealing with filters, I’ve seen numerous questions regarding “What does the proper validation of an IFilter mean? What tests should we execute and how to excute them?” . Hence, its only appropriate that we publish a document detailing our rigorous test procedure so that everyone targeting components at MS Search products can benefit from it. 


 Disclaimer: The following list presents only a subset of the testing methodologies we apply at MS Search and are by no means meant to be a quick recipe for weeding out ALL security vulnerabilities in your filter.The list is meant to provide an overview of the  issues one should think about while testing and implementing filters.


—————————————————————————————————————-


A. Architectural Considerations : – COMPLIANCE REQUIRED


 











1. The Filter DLL does not require the client to be installed on the indexing machine.
2. The Filter dll does not make references to other binaries during compile time.
3. The filter dll is monolithic, self- contained without any other external dependencies.
For an overview of the problems caused by non-monolithic DLLs, please see:
http://blogs.msdn.com/ifilter/archive/2006/11/20/breaking-the-monolithic-filter-dll.aspx


B.Threading Model: – COMPLIANCE REQUIRED


 







Filter threading model must be marked as either “BOTH” or “Free” under:
HKEY_CLASSES_ROOT\CLSID\{GUID}\InprocServer32

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{GUID}\InprocServer32

We recommend using “BOTH” threading model.An object that is marked with a threading model of “Both” takes on the threading model of the thread that created the object. Marking the threading model as “Both” necessitates that the filter is threadsafe.


C.OS Versions Supported:

















The Filter should support the follwing OS versions:
-> WinXP & Win2K3 : Filtering of <document format> should be checked with WDS 3.0.
–> For Vista and Longhorn, use the built in search facility.


D. Backwards compatiability with SPS2003 :



1. Register filter dll with SPS 2003.
2. Create a content source with your documents, crawl and query.


E. Loading Mechanisms : – COMPLIANCE REQUIRED


 









The filter needs to support all three loading mechanisms for backward and forward compatiability reasons. We recommend trying to load via IPersistStream and fall back to IPersistStorage or IPersistFile only if IPersistStream is not supported.

The IFilterExplorer can be used to check which loading mechanisms are supported:
http://www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx


F. Dedicated support for 64 bit platforms :



For 64 bit platforms, there should be no dependency on 32 bit binaries, i.e., no WOWing applications.
Run <Depends.exe> to check if dependencies are satisfied to prevent runtime errors.

Known Issue: A dependency on MSJAVA.dll shows up in red in dependency walker. You can safely ignore this.


G. Code Coverage:


We recommend at least 70% code coverage. This can be easily profiled using VS 2005 Team System.


H. IFiltTst – Consistency, Legitimacy and Illegitimacy tests:


 






99%”>
http://msdn2.microsoft.com/en-us/library/ms692580.aspx


I. Security tests with Fuzzing :









1. Fuzz a minimum of 0.5 million of each document format handled by the filter and feed them to FilterTest.
2. Have PageHeap enabled throughout the Fuzz test run.
3. Analyze any heap corruption, stack overflow, buffer overrun, crashes etc and resolve/fix the bugs.

Pageheap can be enabled with Appverifier. Download here:
http://www.microsoft.com/downloads/details.aspx?familyid=bd02c19c-1250-433c-8c1b-2619bd93b3a2&displaylang=en


NOTE: The fuzzer is an internal tool. A list of external fuzzers is provided here: http://www.infosecinstitute.com/blog/2005/12/fuzzers-ultimate-list.html


Again, use these at your own risk:)


J. Performance Scaling:


 









Optimum usage of processors in a server environment is crucial for performance. The goal is to achieve 80% performace scaling with the addition of each new processor. Here’s the test outline.
1. On a Quad proc machine, use ifilttst.exe with one thread to filter a large corpus of document and note down the time taken.
-> Now use ifilttst.exe with two threads to filter the same corpus. The time taken should be (0.556 * TIME FOR FILTERING WITH ONE THREAD)
-> With the addition of each subsequent thread, the new time T2 can be found with the formula:
T2 = T1 * 1/[(1.8)^ (log2 N)] where N is the number of threads.


K. AppVerifier Tests :








Logs Provided”>
http://msdn2.microsoft.com/en-us/library/aa480483.aspx


L. Globalization:


 

























If the document format facilitates marking the language / locale of contents (eg.MS Word), filtering of the documents marked with above languge tags must be verified. This is important as the the filter emits a locale information based on the language of the document, which is used by MSSearch to invoke the correct WordBreaker and Stemmer for the document.


M. Registry and File I/O:


 









1.Use Filemon.exe with the filemon filter set to the name of your dll and verify that no file system I/O was initiated by IFilter other than the documents it is indexing. Take special note if the filter is creating temp files.
2. Use Regmon.exe to verify that no registry read/write operations are performed.

www.sysinternals.com has both 32 and 64 bit versions of Filemon and Regmon.


N. Prefix/Prefast for Vista :









In Office team, the OACR checks for this if we build with windows Prefast requirements.However in other environments, we need to use the Visual Studio build configuration manager to enable Prefast error checking.

More info( MS Employees):
PREFIX internal website
PREFAST: wrapped in OACR


WWW Resources:


http://msdn2.microsoft.com/en-us/library/ms933794.aspx 


O. Calls to undocumented windows API :



Run APIScan to ensure we do not make any calls to undocumented windows API’s.


Note: This requirement is solely for MS and MS partners to avoid situations like Secret API fiasco.


P. SAL annotation :



SAL annotation is an excellent way to weed out potential security flaws in the code. More info at:
http://msdn2.microsoft.com/en-us/library/ms235402(VS.80).aspx


Q. UI Popups :



Use Filtdump to filter the document and ensure there are No UI Popups.


R. International Sufficiency:


We’ve seen a lot of issues in the past where Unicode / DBCS characters were not handled correctly by IFilters and Protocol Handlers. The problem is a bit more serious in Protocol Handlers as the address of the content source might be encrypted in a DBCS charset and the data retrieval fails.


  • Use multiple special Unicode characters in the file contents and test for their output. The following figure provides a sample of Unicode characters to test.:


  •  S. Security Code Review:


    This is the final line of defense against introducing security bugs in your code. DO NOT be skimpy on this!!! 🙂

    Comments (3)

    1. Matt Ellis says:

      Hi guys. Nice post, but where can I get a hold of ifilttst.exe? It doesn’t seem to live in the Windows SDK any more. I can get FiltDump.exe from a copy of the Windows Search SDK (which I think has been superceded by the Windows SDK).

      Any chance of a download of the latest versions of these tools? (IFiltTst, FiltDump, and I think it’s FiltReg?)

      Cheers

      Matt

    2. Stephan Mühlstrasser says:

      Hello Deb,

      will there ever be 64-bit versions of the lrtest and ifilttst tools? As 64-bit processors become more and more common, I would expect Microsoft to release the Windows Server resource kit also for 64-bit… It’s hard to test and debug IFilters on 64-bit without lrtest and ifilttst.

      Best Regards

      Stephan

    3. ken says:

      Hi Deb,

      I tried emailing you via this blog's contact link, but I guess you don't check it that often 😛

      Do you have any examples of how to get an IFilter to return a multi-value/multivalue from IFilter::GetValue?

      I tried wrapping the COM values in a SAFEARRAY, but Vista's indexing service doesn't recognize it at all.  I'm trying to test on Sharepoint 2010, but still struggling w/ the install for that so haven't been able to yet 😛

      I have put in enough instrumentation to determine that indexing service only calls ::GetValue once instead of calling it multiple times until it finds no more values, so the only other thing it can return is a SAFEARRAY.

      Also, are there limitations on multivalue data types?  I.e., can it be a multivalue of ints, dates, etc. instead of only strings?  I've found references that multivalues can be strings, but nothing else…