Ever since I started dealing with filters, I've seen numerous questions regarding "What does the proper validation of an IFilter mean? What tests should we execute and how to excute them?" . Hence, its only appropriate that we publish a document detailing our rigorous test procedure so that everyone targeting components at MS Search products can benefit from it.
Disclaimer: The following list presents only a subset of the testing methodologies we apply at MS Search and are by no means meant to be a quick recipe for weeding out ALL security vulnerabilities in your filter.The list is meant to provide an overview of the issues one should think about while testing and implementing filters.
A. Architectural Considerations : - COMPLIANCE REQUIRED
1. The Filter DLL does not require the client to be installed on the indexing machine.
2. The Filter dll does not make references to other binaries during compile time.
3. The filter dll is monolithic, self- contained without any other external dependencies.
For an overview of the problems caused by non-monolithic DLLs, please see:
B.Threading Model: - COMPLIANCE REQUIRED
Filter threading model must be marked as either "BOTH" or "Free" under:
We recommend using "BOTH" threading model.An object that is marked with a threading model of "Both" takes on the threading model of the thread that created the object. Marking the threading model as "Both" necessitates that the filter is threadsafe.
C.OS Versions Supported:
The Filter should support the follwing OS versions:
-> WinXP & Win2K3 : Filtering of <document format> should be checked with WDS 3.0.
--> For Vista and Longhorn, use the built in search facility.
D. Backwards compatiability with SPS2003 :
1. Register filter dll with SPS 2003.
2. Create a content source with your documents, crawl and query.
E. Loading Mechanisms : - COMPLIANCE REQUIRED
The filter needs to support all three loading mechanisms for backward and forward compatiability reasons. We recommend trying to load via IPersistStream and fall back to IPersistStorage or IPersistFile only if IPersistStream is not supported.
The IFilterExplorer can be used to check which loading mechanisms are supported:
F. Dedicated support for 64 bit platforms :
For 64 bit platforms, there should be no dependency on 32 bit binaries, i.e., no WOWing applications.
Run <Depends.exe> to check if dependencies are satisfied to prevent runtime errors.
Known Issue: A dependency on MSJAVA.dll shows up in red in dependency walker. You can safely ignore this.
G. Code Coverage:
We recommend at least 70% code coverage. This can be easily profiled using VS 2005 Team System.
H. IFiltTst - Consistency, Legitimacy and Illegitimacy tests:
IFiltst can be used to run the following test:
Consistency Test: The chunks emitted by the filter should be consistent between two runs.
Legitimacy Test: This test validates that the filter is initialized with proper config and getText() and getValue() are functioning as expected.
Illegitimacy Test: In essence, this test tries to validate that the filter is well behaved by trying to exercise inappropriate configs during initialization and also by calling getText() on value type chunks and vice versa.
Details of using IFilttst can be found here: http://msdn2.microsoft.com/en-us/library/ms692580.aspx
I. Security tests with Fuzzing :
1. Fuzz a minimum of 0.5 million of each document format handled by the filter and feed them to FilterTest.
2. Have PageHeap enabled throughout the Fuzz test run.
3. Analyze any heap corruption, stack overflow, buffer overrun, crashes etc and resolve/fix the bugs.
Pageheap can be enabled with Appverifier. Download here:
NOTE: The fuzzer is an internal tool. A list of external fuzzers is provided here: http://www.infosecinstitute.com/blog/2005/12/fuzzers-ultimate-list.html
Again, use these at your own risk:)
J. Performance Scaling:
Optimum usage of processors in a server environment is crucial for performance. The goal is to achieve 80% performace scaling with the addition of each new processor. Here's the test outline.
1. On a Quad proc machine, use ifilttst.exe with one thread to filter a large corpus of document and note down the time taken.
-> Now use ifilttst.exe with two threads to filter the same corpus. The time taken should be (0.556 * TIME FOR FILTERING WITH ONE THREAD)
-> With the addition of each subsequent thread, the new time T2 can be found with the formula:
T2 = T1 * 1/[(1.8)^ (log2 N)] where N is the number of threads.
K. AppVerifier Tests :
The Appverifier tests seek to weed out critical security and performance defects. The tests should be conducted in 3 layers,each layer of test executed in a seperate test run.The layers are described below.
-> Exceptions - Ensures that the application does not hide AVs using structured exception handling.
--> Handles - Ensures that the application does not attempt to use invalid handles.
--> Heaps - Checks for memory corruption issues in the heap.
SETTINGS: Full Page Heap
Dll : <IFilter Dll>
--> Locks - Verifies correct usage of critical sections and identifies potential deadlocks (timeout 7 minutes).
--> Memory - Ensure calls to APIs for virtual space manipulations are used correctly.
-->Threadpool - Checks for dirty threadpool thread and other threadpool related issues.
-->TLS - Ensures that Thread LOcal Storage APIs are used correctly.
The expectation for this scenario is that the application does not break into the debugger. This means that you have no errors that need to be addressed.
2. LOW RESOURCE SIMULATION: Accept the default settings. Filter a corpus(large collection of documents) containing 10000+ files. Use IFiltTst to loop through the corpus filtering the files. As long as we can get through the corpus without breaking into the debugger, it should be fine.
3. MISCELLANEOUS: Here check the
--> Dangerous APIs: checks for proper usage of API calls such as "TerminateThread"
--> Dirty Stack - detect uninitialized variables in future function calls in that thread's context.
Accept the DEFAULT Settings here as well.
HOW TO RUN THE TESTS:
1. Start Appverifier.
2. Add your application (IFiltTst) to Appverifier.
3. Check off the test mentioned above. You need to run the test three times
for each layer.
4. Save your application.
5. Set the PROPAGATE property to true -> this ensures appverifier settings are
applied to any threads spawned by IFiltTst.
6. Run IFiltTst from the command line on a corpus containing 10000+ files.
7. Save the Logs from the three runs.
Detailed information about using Appverifier can be found here:
If the document format facilitates marking the language / locale of contents (eg.MS Word), filtering of the documents marked with above languge tags must be verified. This is important as the the filter emits a locale information based on the language of the document, which is used by MSSearch to invoke the correct WordBreaker and Stemmer for the document.
M. Registry and File I/O:
1.Use Filemon.exe with the filemon filter set to the name of your dll and verify that no file system I/O was initiated by IFilter other than the documents it is indexing. Take special note if the filter is creating temp files.
2. Use Regmon.exe to verify that no registry read/write operations are performed.
www.sysinternals.com has both 32 and 64 bit versions of Filemon and Regmon.
N. Prefix/Prefast for Vista :
In Office team, the OACR checks for this if we build with windows Prefast requirements.However in other environments, we need to use the Visual Studio build configuration manager to enable Prefast error checking.
More info( MS Employees):
PREFIX internal website
PREFAST: wrapped in OACR
O. Calls to undocumented windows API :
Run APIScan to ensure we do not make any calls to undocumented windows API's.
Note: This requirement is solely for MS and MS partners to avoid situations like Secret API fiasco.
P. SAL annotation :
SAL annotation is an excellent way to weed out potential security flaws in the code. More info at:
Q. UI Popups :
Use Filtdump to filter the document and ensure there are No UI Popups.
R. International Sufficiency:
We've seen a lot of issues in the past where Unicode / DBCS characters were not handled correctly by IFilters and Protocol Handlers. The problem is a bit more serious in Protocol Handlers as the address of the content source might be encrypted in a DBCS charset and the data retrieval fails.
S. Security Code Review:
This is the final line of defense against introducing security bugs in your code. DO NOT be skimpy on this!!! 🙂