Chronicles of an IFilter development – inception to deployment.

I often asked myself the question – How do independent vendors develop IFilters for MS Search Products and what are the challanges they face? It occurred to me that if I could somehow document the development lifecycle of an IFilter developed by someone other than Microsoft, it’d probably provide answers to a lot of baffling questions facing independent vendors developing these components.

I recently had the oppurtunity to have a detailed discussion with Marco van Schagen from CAD & Company, who has just embarked on a fascinating voyage of refactoring the CAD(DWG) IFilter.

This thread is meant to address the issues faced by Marco which may be of broader interest to several of us implementing our own filters.

 In Marco’s own words:

Currently I am planning a new version of our 2005 version DWG iFilter. This is to support the newer 2007 DWG file format, and address questions on it’s operation with SQL 2005 and MOSS (sharepoint) 2007.

Would you be interested in information on this product, please visit

As I am unexperienced with iFilter development, I have many questions to find answers for.

In my preparations, Deb Haldar has provided me with crucial information to help me get on the right track. I would like to share this information in his blog to help make this the “One stop shop” for IFilter related issues. Probably a seperate thread will be created to track the development cycle of an iFilter from scratch.

The existing 2005 version iFilter project is coded mostly in C++, in VC 6.0.

Some questions I’d like to discuss:

– Should we use C++ or transfer to dot Net and why?

– What is required for coding a proper iFilter

– Testing for Multithreading compatiability

– Registration with Sharepoint and SQL 2005

I would like to start sharing my information soon, and I am interested in your comments.

Marco van Schagen


Comments (55)

  1. Marcovanschagen says:

    Deb, thank you. I will start posting very soon.

  2. Marcovanschagen says:

    As I have no experience on the subject, I am browsing the web for more information on how an iFilter should be built.  It seems some people would like to rebuild their implementation in dot Net while others explain why they should not do this, leaving me confused.

    Our current implementation is done in C++, so I asked Deb Haldar from Filter Central for help.

    His answer on the subject is very clear:

    It’s preferable to write your filter in C++ as all the MS Search services are implemented as COM components. So if you use C# to write your filter, you’ll have to go through the .Net Interop every time you invoke any of the filter methods, which can be a severe performance hit.

  3. Marcovanschagen says:


    In our implementation we have a config loader written in VB dot Net. This is called each time the iFilter is called. If I understand you right, calling this dot Net section from our unmanaged C++ iFilter also causes a performance hit?

    If so, if I cannot avoid this performance hit anyway, would I make things much worse if I write the iFilter itself in dot Net?a

  4. Deb Haldar says:


    Is your config loader used to just initialize the filter with appropriate parameters required by the IFilter:Init() ? If so, I’d still retain the C++ implementation as once the filter is initialized, it’ll not have to go through Interop anymore. Typically Init() is called once whereas the other IFilter methods can be called thousands of times depending on the size of the corpus/document.

  5. Marcovanschagen says:


    You are absolutely right, the config loader is called only once.

    So, if I would like to call some code trough then Com Interop – that is perfectly okay. I just need to be very aware to do this only once in the IFilters’ lifetime, and not, for example, for each chunk or value.

  6. Marcovanschagen says:

    I asked Deb if he can show specifics to keep an eye on while coding for an IFilter implementation. I am posting my questions and his answers in one big post here.

    Would you be able to publish some best practices information, guidelines and templates – or suggest some good ones?

    <Deb>The first time I had to write a filter, I started by looking at the IFilter interface in MSDN. A good article that one of the key MS Search architects wrote can be found here:

    Once you’ve written your filter, I’d highly recommend testing it with ifilttst.exe which ships with Win2K3 and documentation can be found here:

    Most of the problems I’ve come across from our consultants regarding IFilters can be broadly categorized into two issues:

    1. Multithreading compatibility: This is critical for enhancing the performance of server products.

    Be sure to test your IFilter for multithreading using IFilttst.exe on a multi-processor machine.

    2. Registration issues with Sharepoint and WDS: The correct registration steps are given at

    Recently I had to debug the DWG filter registration issue on one of <a big company>’s servers. So I know

    it works for sure with Sharepoint 2007. For the other MS Search products, I’d appreciate if you can

    post your test results on the IFilter blog:)



    My colleague who built the current version hinted me that some users have had issues with multithreading. They tell us most of the time it works fine. It would be great if your blog could point to some do’s and don’ts in code?

    Some C++ coding examples showing good and bad practice might help a lot of users. I have no experience in C++, and having good examples help me feel more confident in my choices.

    <Deb> Most of the multithreading issues can be pin pointed when you execute the ifilttst.exe with multiple threads on a multiproc machine. Two key things to keep in mind are:

    1. Make sure all your shared resources (file shares, database access,static objects) are well guarded.

    2. The COM threading model is set to “Both” .An object that is marked with a threading model of “Both” takes on the threading model of the thread that created the object. This is the way we recommend our filters are registered with search service. Marking the threading model as “Both” necessitates that the filter is threadsafe.

    3. I do not have any ready recipes at this point for identifying multithreading issues. However I found the two following resources particularly useful:

    1. Inside COM by Dale Rogerson –> Ch.12

    2. Multithreading applications in Win32: The complete guide to threads by Jim Beveridge and Robert Wiener



    Deb, I am very interested in your findings on the DWG IFilter at <a big company>.

    I am interested in your details, did their implementation index database content or only a file share (and would there be a difference?).

    Also I am surprised, our current version was not prepared for Sharepoint 2007. Going by the common internet discussions I expected changes are needed to make it work on SQL2005.

    <Deb> The server effectively indexed exchange server and file shares. However, if the proper filter is registered correctly, there should be no difference in indexing as once the data is fetched from a content source by the appropriate protocol handler, the filter should process the data correctly.

    In other words, the type of content source does not affect filtering.

    The DWG filter did not work as-is on the server. We had to tweak some registry settings for it to work. It’s probably the same issue with SQL server, but I do not have a definitive answer:)

  7. Marcovanschagen says:

    We are making good progress on our new DWG IFilter 2007 development project.

    Here is an update, for those interested in the DWG IFilter:

    I have completed the switch to a different library to read the dwg files, so now we are fully compliant with AutoDesk. This means we now read dwg files up to the 2007 file format.

    The IFilter specific part of the code is still based on the smpfilt example.

    Next step is to finish all other parts and create the completed product. If the tests are succesful we will greate an installer package ready for shipping.

    You can expect the product to be available as planned in 2007 Q1. Would anyone be interested in beta testing, please drop me an email.

    We are now collecting requirements for our next release, DWG IFilter 2008. If required we will add support for the 2008 dwg file format. Also we will add support for indexers where not available in our 2007 version. Also, this release is expected to better support SQL 2005.

    Customers who obtain the 2007 version can expect to update to the 2008 version free of charge.

    Currently I am collecting information to start planning the 2008 version.

    Deb, I do have a question on registration.

    In my current implementation, registration of our filter is based on FiltReg.Hxx, as provided in the original simpfilt example. This works like a charm, I expect this is the perfect solution for most occasions.

    I wonder to what extend this FiltReg.Hxx also supports the newer indexers; and for which products do I need a different approach. It would be great to have an updated version available if needed.

  8. Marcovanschagen says:

    Deb, I would like to know about registration for SQL 2005.

    As you mention here Shajan has good information on registration an IFilter for SQL 2005. This registration for SQL 2005 seems be a bit of a complex thing.

    Would there be some helper code or macro available for this task? I do like the RegFilt.Hxx approach as provided with the simpfilt example.

    I hope these tasks can be fully automated (as with FiltReg), if not, what manual steps do you expect we need to document for our users?

    Is it required to copy the filter dll file to specific locations for each SQL server instance, or could this be rerouted to our install folder?

    As posted here and here it sounds like it is required to create a signed IFilter dll. I believe dll Authenticode signing is described here at ‘Project: Authenticode Signing with a Test Certificate" and here .

    Can you confirm this information to be applicable to IFilters in SQL 2005? What kind of certificate would we need. Do de use certificates we create ourselves or may it be preferrable to obtain one at a certificate provider?

    By the way, is a certificatee also required for 64 bit installations? I heard those environments are more restrictive on what 3rd party software is accepted.

  9. Deb Haldar says:

    Marco, it’s good to know your product is coming along well!

    For registering the filter, the self registration code kicked off by regsvr32 takes care of the indexing service registration.If you are targeting specific MS Search products, then you need to make sure their registration requirements are also satisfied.For MOSS and WSS, this info is already posted on the blog.Currently, MOSS falls back to indexing service registration if it does not find specific filters registered for the content its crawling.However, this is not a recommended solution as the fallback on indexing server may not be supported on future versions of the product.Please ensure that any keys not set by regsvr32 for the targeted product is set by your installer package.

    In the near future the Enterprise Search group will the sole distributor of filters to other MS Search products through the filter pack initiative. At that point, we’ll be able to provide a more concrete and unified solution to the registration issue.

    As a sidenote, WDS 3.0 loads via IPersistream and hence the smpfilt might not work there. It’s be a good idea to double check this is not the case with DWG filter.

    Also, a good way to debug registration issues without the pain of hooking the debugger is using Regmon from sysinternals:


  10. Marcovanschagen says:


    I am wondering, which are the MS Search products and versions we could be targetting now.

    So it would be great to publish a table listing the Search products and versions, with their IFilter registration issues and requirements. Possibly there will be differences among platforms, or 32/64 bit versions.

    We would like to see if each will accept default registration, as provided with FiltReg (even if only for backward compatibility). Where will we run into problems when we stick to the simpfilt example.

    Is digital signing required, optional, or not applicable. If we are forced to have an additional copy of the ifilter dll in a certain location. You could possibly add many more items.

  11. Marcovanschagen says:


    For a future release, I’d like to have my IFilter return a different response depending on the Search product that is beeing used.

    When I would be able to differentiate here, it would be possible to allow an unregistered, free version to be used for, for example, desktop filesystem use only.

    A registered version would be required for example in a server environment, with certain search product.

    Would you have an idea how I could do this?

  12. Deb Haldar says:

    Marco, I just verified with our Security contact that “Digital Signing is a requirement”. We recommed that you use a trusted certificate provider such as Verisign. This is the company which currently countersigns all our vendor binaries as well. Here’s a complete list of MS approved certificate providers:

  13. Deb Haldar says:

    The only way that I can think of is as follows:

    Determine the id of the calling process and map to the calling process name, such as <mssearch> for MOSS or <cisvc> for indexing service.

    However, if the service name changes from one version of the targeted product to another, the DWG filter will break.

  14. Deb Haldar says:
    • To make the question posed by Marco suitable for a broader audience, the issue has been broken down into three parts. John Kane, who has worked with IFilters in all versions of SQL Server since FTS was first incorporated in SQL Server 7.0 Beta3, was kind enough to share his experience with us.


    <Deb> For registering a filter with SQL server, is it necessary to to copy the filter dll file to specific locations for each SQL server instance, or could this be rerouted to the install folder?

    <John> Neither SQL Server 2000 or SQL Server 2005 require the IFilter dll to stored in any specific location, just that it be properly registered and working. However, SQL Server 2005 FTS has added a new level of security for 3rd party developed ISV IFilters. Specifically, SQL 2005 FTS requires certain settings be disabled for security reasons via: “sp_fulltext_service” – “Enabling use of OS resources provides access to resources for languages and document types registered with Microsoft Indexing Service that do not have an instance-specific resource installed”

    sp_fulltext_service ‘verify_signature’, 1

    sp_fulltext_service ‘load_os_resources’,0

    Additionally, you will need to stop and restart the SQL Server and MSFTESQL services after the above changes are made. Then you can query the sys.fulltext_document_types system table to get a list of all IFilters used by that instance of SQL Server 2005. The above steps are not required by SQL Server 2000 FTS as it will use all successfully installed and registered IFilters.

    Furthermore, the following related SQL 2005 KB article if they have ‘verify_signature’ enabled and the server does not have internet access, you will want to disable it via: sp_fulltext_service ‘verify_signature’, 0 – “You may experience a 45-second delay when you run a full-text query in an instance of SQL Server 2005 that is running on a server without Internet access

    <Deb> Do we require third party filter dlls to be signed before they can be consumed from within the search services in SQL?

    <John> Yes, the IFilters are required to be signed for SQL Server 2005, but not for SQL Server 2000.

    <Deb> If we have multiple search products installed on a single machine, is there a deterministic way for the filter dll to know which search service invoked the filter dll? I suppose we can always inquire for the name of the calling process but since the name of the search services keeps on changing, a more elegant solution would be ideal.

    <John> Citeknet provides a very good “IFilter Explorer” that identifies multiple search products installed on a single machine and which Ifilter is related to each search product.


    John can be reached at SQL 2005 FTS blog.

  15. Marcovanschagen says:

    In various places on the web, we have been told to start supporting the iPersistStream interface. Also I have seen remarks letting me to believe the easy approach is to support iPersistFile  or  iPersistStream. I tried to do both, I’ll explain how.

    First of all, I needed to see how the search indexers accept my ifilter implementation, so I have downloaded and installed IFilter Explorer 2.0 from I was amazed how much insight this tool gives us in how our filter may work with the different search indexers.

    In the ‘Windows Search 3.0’ tab, my filter implementation shows up with red marked ‘no’ in the IPersistStream column. Appearently my WDS would like my filter to expose the IPersistStream interface.

    Now, how to add such an interface. Well, I needed some sample code for I am not experienced with C++ and com interfaces. Searches on the internet did not result in any such examples for IFilter implementations at all, making it more useful to describe how I have done this.

    First, I listed the additional items in the interface, and added them to the main header file:


    class DWGIFilter2007 : public IFilter, public IPersistFile, public IPersistStream



       // From IUnknown

       virtual  SCODE STDMETHODCALLTYPE  QueryInterface( REFIID riid, void  ** ppvObject );

       virtual  ULONG STDMETHODCALLTYPE  AddRef();

       virtual  ULONG STDMETHODCALLTYPE  Release();

       // From IFilter

       virtual  SCODE STDMETHODCALLTYPE  Init( ULONG grfFlags, ULONG cAttributes, FULLPROPSPEC const * aAttributes, ULONG * pFlags );

       virtual  SCODE STDMETHODCALLTYPE  GetChunk( STAT_CHUNK * pStat );

       virtual  SCODE STDMETHODCALLTYPE  GetText( ULONG * pcwcBuffer, WCHAR * awcBuffer );

       virtual  SCODE STDMETHODCALLTYPE  GetValue( PROPVARIANT ** ppPropValue );

       virtual  SCODE STDMETHODCALLTYPE  BindRegion( FILTERREGION origPos, REFIID riid, void ** ppunk );

       // From IPersistFile

       virtual  SCODE STDMETHODCALLTYPE  GetClassID( CLSID * pClassID );

       virtual  SCODE STDMETHODCALLTYPE  IsDirty();

       virtual  SCODE STDMETHODCALLTYPE  Load( LPCWSTR pszFileName, DWORD dwMode );

       virtual  SCODE STDMETHODCALLTYPE  Save( LPCWSTR pszFileName, BOOL fRemember );

       virtual  SCODE STDMETHODCALLTYPE  SaveCompleted( LPCWSTR pszFileName );

       virtual  SCODE STDMETHODCALLTYPE  GetCurFile( LPWSTR  * ppszFileName );

    // Additionals from IPersistStream

    // IsDirty(void) – this one is allready available

       virtual  SCODE STDMETHODCALLTYPE  Load(IStream *pStm);

       virtual  SCODE STDMETHODCALLTYPE  Save(IStream *pStm, BOOL fClearDirty);



    /////// all the private stuff did not change


    Effectively, the Load, Save, GetSizeMax functions need to be added to the implementation with these parameters. Having more definitions with the same function name is no problem, this is called overloading.

    You need to add your implementation of these functions in your code. Check MSDN on what is expected here; for now you can add them without much code inside the function.

    Now, this does not yet show any differences in the IFilter Explorer. The new interface needs to be exposed trough QueryInterface in your implementation. Note you have two of these; just keep the one in the classfactory class unchanged.

    You need to add one line to be able to return an IPersistStream interface:


       if ( IID_IFilter == riid )        pUnkTemp = (IUnknown *)(IFilter *)this;

       else if ( IID_IPersistFile == riid )    pUnkTemp = (IUnknown *)(IPersistFile *)this;

       else if ( IID_IPersistStream == riid )  pUnkTemp = (IUnknown *)(IPersistStream *)this;

       else if ( IID_IPersist == riid )        pUnkTemp = (IUnknown *)(IPersist *)(IPersistFile *)this;

       else if ( IID_IUnknown == riid )        pUnkTemp = (IUnknown *)(IPersist *)(IPersistFile *)this;

       else    {  *ppvObject = NULL; return E_NOINTERFACE;  }


    Now, build and register your filter and check IFilter Explorer. A nice ‘yes’ should now show up in the IPersistStream column.

  16. Marcovanschagen says:

    In my VS2005 project I have created different configurations for the different debug modes I need to use to run the filter. The standard Debug configuration I use to debug the IPersistFile implementation with IFiltTst.Exe.

    This method is applicable probably to most IFilter developers. I use this on my C++ project, it may also work fine on other types.

    How to set this debugging:

    In your Project properties window, open ‘configuration properties’, ‘debugging’; and enter the following settings:

    Debugger to launch:  Local Windows Debugger

    Command: C:<your path here>IFiltTst.Exe

    Command arguments: /i  “D:<your project test folder><good file>.<extention to filter>” /v 3 /t 5 /l /d

    Working directory: /i  “D:<your project test folder>

    Attach: No

    Leave all other settings as default.

    Oh, you need to allow Edit & Continue if you like:

    In your Project properties window, open ‘configuration properties’, ‘C/C++’, ‘General’.  At ‘Debug Information Format’, select <i> Program Database for Edit & Continue (/Zl). Remember to not use this setting for your final production build.

    Now, just set a breakpoint and press the Run button. You can now step trough your code and check your variables. I find this very helpful in my IFilter R&D and development.

  17. With the public release of Vista a week back, soon developers will be wondering how to write and debug

  18. Deb Haldar says:

    Great post Marco! Additionally, please check out the the following posts:

    1. Debugging IFilters with WDS 3.0 and Windows Vista.

    2. Debugging IFilters in MOSS/WSS.

  19. Deb Haldar says:

    Excellent info here Marco! Your IFilter skeleton should provide a good reference for anyone writing this from scratch. Here’s a little elaboration of the LOad method while using IPersistStream:

    SCODE STDMETHODCALLTYPE CMyFilter::Load(IStream* ps)


        m_pISream = ps;



        return S_OK;


  20. Marcovanschagen says:

    To be able to add stream support to my DWG IFilter, I needed to convert the stream into a file. Our DWG library supports file based actions only (as far as I can tell), and I cannot find another way out since we really want to use this library.

    I need to warn you, the use of temp files from within your IFilter code is **NOT** desired. The search products are setup to try and disallow any write actions onto the file system. Any newer release of the search product may in fact me more succesfull in blocking such write actions.

    So, I needed to find a good way of getting my stream into a file. I will describe my findings here so you may easier find your way if you would ever need to handle such a problem.

    While developing and debugging, the code runs in my account. This enables the code to create and access temp files. This is good; there are some nice functions accessible to get a new unique name, create a file in the temp location, and automatically clean up the file aferwards. So, good things are going on until I make my development code run in a real WDS. It just refuses to work. Actually, after finding the WDS temp folder and allowing access to some accounts (find them using ProcessMonitor) all is fine and up and running okay. However, after a restart, my added accessrights disappear from the WDS temp folder. So, this is not an operational option.

    In the end, I decided to have my own temp folder; setting some rights there.

    You need to write some code to assure a reasonable unique filename, also make sure you clean up after you are done using the file.

    Additionally, most important, you need to check and remove all old left-over files, since nothing is worse than being the developer who fills up every user’s harddrive with junk!

    I choose a file naming based on the systems timer, in seconds since 1970. This way I can easy see if there are any files older than, say, 10 seconds. In my believe this is all I need – I’ll just test this for another little bit.

    Oh, please do not forget to exclude your temp folder from WDS indexing – or indexing will happily try to index and reindex it’s own temp files.

    Now just hope the search people don’t read this and disable all write access 😉

  21. Deb Haldar says:

    This is excellent information Marco !!! I’m sure all the folks implementing Stream based components for WDS will be really appreciative:)

  22. kert says:

    fyi, i am writing an ifilter for an XML-format file type that contains other common office file types , text, doc, pdf and so on. to "subfilter" those i load the filters for corresponding files, and attempt to do it over ipersiststream first.

    often, the "subfilter" doesnt support that, and i will fall back to writing the contained document into a tempfile and load subfilter on that temp file.

    what other alternatives do i have ?

    btw, i have everything working beautifully under ifilttst, filtdump and so on, but .NET clients absolutely refuse to load my filter, apparently because of threading models mismatch or something .. from what i gather the client insists on getting the imarshal interface or something. i havent quite figured that out yet.

    I have no problems writing threadsafe code per se, but debugging apartment/com threading issues is a b*tch

    It would be useful if there was an ATL object template with appropriate threading model stuff specified (i.e. do i use CComMultiThreadModel as the root object, do i set aggregation and so on ? )

  23. Marcovanschagen says:


    Did you check with ProcMon, or Process Monitor from the sysinternals tools for anything that gets denied? Just when I started to do so I noticed testing and operational use is done within different security context. I noticed my filter starts loading and then breaks just when trying to write the temp file.

    If your problem is different from mine; it may also apply to my situation; so I am curious what solution you will find.

    For temp file workarounds, I found some articles on solutions like file-in-memory or ram-drive like solutions. Also something like making a named stream that you can access filebased trough an url. For all of these I did not find nice sample code to start trying this, but this may get you closer to your solution.

    Please keep us posted on your findings?

  24. kert says:

    Well, i found out that .NET actually loads the filter, it just doesnt get along with it. I hadnt set the "Image File Execution Options" to debug the DLL when running under .NET, once i set this, and defined my custom ClassFactory ( i.e. copied the CreateInstance code from ATL::CComClassFactory, i could hit breakpoints.

    Turns out, that the .NET is trying to aggregate my object in a wierd way ..

    In particular, im using the sample code from here:

    It uses LoadIFilter on a file to load the filter .. with a controlling IUnk passed in. Now when i do this same thing from my C++ code, i get the S_OK back from the LoadIFilter function, and can step through my object creation process in ATL successfully.

    But when the same thing is done from under .NET, i see that in classFactory it actually tries to create my object with IPersistFile iid passed in, which at least according to ATL is not kosher. ( i wouldnt worry a wee bit, but the same .NET example loads office and pdf filters just fine )

    Last time i checked, under .NET the code gets back either "interface not aggregatable" or "no such interface" depending on whether i use my own classfactory hack or not.

    Anyway, thats approximately where i stand at. I dont understand how and what kind of object should be aggregated here .. and how should i support it in my code.

    I tried lots of tricks, did declare IPersistFile as an aggregateable interface on my object with a dummy object aggregated and all sorts of mumbojumbo but without thorough understanding of the intended outcome im at loss.

    BTW, the filter that i am writing will be on Sourceforge shortly under BSD licence, all ATL code under VS2005

    Id like to get it working in all the test apps though, first.

    also, i can be contacted directly at my blog link

  25. Deb Haldar says:


    1.Can you confirm when the CLR is loading the filter, the pUnkOuter passed to the CreateInstance method is NULL? If it is not,the CLR tries to aggregate the component. A workaround might be to put the aggregatable attribute on your filter

    coclass. Please see for details.

    2. If you do not intend to aggregate the component, insert  DECLARE_NOT_ AGGREGATABLE macro into your Filter class definition and modify the IFilter Load method to pass a NULL value when loading the filter.

    One further consideration:

    2. Are you defining it as a STA ? Check the stdafx.h for #define _ATL_APARTMENT_THREADED Note that the both offfilt and PDF Ifilter are having their threading model set to Both in registry. Filter’s registry setting and ATL threading have to be in synch.

  26. Marcovanschagen says:

    Hi all,

    Our development of the DWG IFilter has had some good progress. We will have a first round of beta testing with external parties next week- and then another round for everyone interested. I hope to release the product around start of April.

    Would you like to join beta testing, drop us an email at

  27. Deb Haldar says:

    This is great news Marco! I’m sure a lot of our consultants will be eager to Test Drive the DWG IFilter – I’ve seen a number of questions being asked as to when it’ll be available:)

  28. Marcovanschagen says:

    Deb, I am happy to hear there is a demand for the new dwg IFilter. I would be happy to have some more consultants join our first beta round. I am preparing this first beta **today** and will be sending out the info today or tomorrow :).

  29. Marcovanschagen says:

    At some point, windows started popping up messages/errors titled Data Execution Prevention – Microsoft Windows. To help protect your computer, windows has closed this program. Name: Indexing Service filter daemon. Publisher: Microsoft Corporation.

    What is going on? The info at MSDN explains DEP. I think it was recently loaded on my machine trough one of the service packs or updates. As I understand, it basically checks if code gets executed in memory blocks that are marked for data. The running applications should do this marking for execution or data. And as I use VS2005, I believe my dll is doing all this marking for me.

    My way of dealing with this, is to change DEP settings:

    Control panel / system / advanced / performance settings / Data Execution Prevention tab -> except; check all unchecked checkboxes for indexing entries or choose DEP only for essential Windows programs and services.

    I hope this also helps you if you run into this problem. I would like an other way of dealing with this, since I do not feel like telling all my users they need to make these adjustments. Suggestions are very welcome!

  30. Marcovanschagen says:

    I bumb into repeated cidaemon errors since I started this IFilter development. I get cidaemon.exe – application error. The Instruction at "0x00…" referenced memory at "0x00000000". The memory could not be "written". Click on OK to terminate the program. Click on Cancel to debug the program.

    At some point I traced it back to a line of C++ code in GetText(..) that copies data from one memory string buffer to another. Also I got some "Buffer too small" errors on this section. So I started using an other command to copy memory, added more debugging info, dumped the memory contents into application log to help tracking the problem. Now the problem just moved, I believe to somewhere inside Cidaemon.exe. I don’t understand, is cidaemon refucing my GetText(..) returns, am I abusing memory, do I need to limit string size going into cidaemon, what is going on? Why doesn’t cidaemon like me 😉 ??

    Help…! Any suggestions here are very, very welcome.

    The latest thing I want to tell my users it to turn off windows Indexing Service. The IFilter I am building is supposed to be used by all kinds of indexing tools. Yeah, you’d better use WDS and turn off indexing service so you do not double index your files; but that choice should be up to you.

    Oh, to manage indexing service you go to: my computer / right click -> manage / services and applications / Indexing Service.

    To stop indexing servioe: my computer / right click -> manage / services and applications / Services -> Indexing service; set startup type to manual or none.

    You may re activate this, when you right click a folder, properties, general tab, advanced; and select For fast searching, allow Indexing Service to index this folder. Note: This is NOT indexing for WDS. With also WDS installed you may be double indexing your files.

  31. Marcovanschagen says:

    In my DWG IFilter Configurator tool I’d like to add some checkboxes to allow for easy registration in target indexers. It seems we can register for MOSS 2007, WSS 3.0, SQL 2005 without having the self registration entries in the standard Indexing Service 3.0 style. This allows me to avoid Indexing Service to pick up the IFilter; avoiding a performance hit on your system. Also this allows me to avoid my CiDaemon issue.

    I’d like to add a similar registration with WDS 3.0 for the same reasons. As I also am a simple user sometimes, I currently have both the standard 2000/2003/XP Indexing Service and WDS 3.0 active on my XP system, both trying to index my files 🙂 So, I assume more people will find themselves in this situation. I’d like my installer to be able to target only WDS 3.0.

    Now, what I found on this: The WDS 3.0 registration instruction at has overlapping registry entries with the Indexing Service part for WDS 2.x (at So, if I use this to register for WDS 3.0, I believe I also register for Indexing Service – which I try to avoid.

    How do I work around this; is there a dedicated WDS 3.0 regisrtation as with MOSS, WSS, SQL2005?

  32. kert says:

    Just a FYI, my custom IFilter is now up on CodePlex under

    its a full Sharepoint customization project, one part of being the IFilter.

    I never got it working under that .NET test app though, it works charmingly with Desktop Search, WSS and MOSS search.

    Maybe in a future releases, meanwhile, patches are welcome 🙂

  33. kert says:

    bytheway, i found a small useful bit of new info on recommended IFilter implemenation over at

    specifically, it lists the useful PROPIDs to support.

  34. Deb Haldar says:

    This might be a far cry – but do you have the XD/NX bit enabled in an Intel/AMD CPU?

    It might be an interesting experiment to disable them and run the same tests.The XD/NX check can be disabled from the BIOS.

     More Info:

  35. Marcovanschagen says:

    I will try to find the XD/NX setting, it is a dual core laptop; I will report back on this.

  36. Charan says:

    Hi ,

    I have some problem working with iFilters .

    i wrote a small app to load a iFilter and check the return value .

    Code —

    void *myIFilter = NULL ;

    IUnknown *myIUnknown = NULL ;

    switch(LoadIFilter(L"c:\windows\system32\offfilt.dll" , myIUnknown , &myIFilter ))


    case S_OK : printf("1");


    case E_ACCESSDENIED : printf("2");


    case E_HANDLE: printf("3");


    case E_INVALIDARG : printf("4");


    case E_OUTOFMEMORY: printf("5");


    case E_FAIL: printf("6");


    default : printf(" UNknow ERROR ");



    The problem with the above code is .. it always   hit "default" ( i am able to compile and generate a exe for the above app  – i just tried declaring  – IUnknown myiUnknown ,and IFilter myiFilter;  but this gives me lot of compilation errors ) .

    I am not able to figure out the problem with the above code .. can some one please help me out on this .



  37. Deb Haldar says:

    What is the specific error code returned by the IFilter?

  38. Marcovanschagen says:

    Hi Charan,

    I noticed in the LoadIFilter call you use the filename of the office IFilter dll. Instead, you could try to provide the filename of the content you’d like to filter.

    Check out the loadIFilter example in the DefaultParser class at – does this help you? The article also shows links to other code samples.

    Personally I have no experience with calling the IFilter routines like this. As I am building an IFilter I do like to see cool code samples of how an IFilter can be used.

  39. Charan says:

    I have included all the error codes that are returned by LoadIFilter method in the switch case ..

    But still the code hits the default case .

    any way , i checked the error code value   turns out that it is – 0x800401f0

    i checked the value in winError.h and only the following Macros has the above value tied to them .

    #define CO_E_NOTINITIALIZED              _HRESULT_TYPEDEF_(0x800401F0L)

    #define CO_E_FIRST        0x800401F0L



  40. Charan says:

    never mind ,

    Figured out myself … just searched for "CO_E_NOTINITIALIZED" error and came to know that we need to initialise the COM using CoInitialize function before using any COM functions .


  41. Marcovanschagen says:

    We are now making progress on the cidaemon.exe errors. The faulting application cidaemon.exe messages have now been addressed.

    Microsoft has supported us with their escalation engineer from the Developer Support Internet Commerce Server/Application Center 2000/Index Server products team. He has helped me and guided me in debugging the IFilter implementation.

    One big error was a bad cast of memory pointers, actually this was it:

    This line:

    *awcBuffer = (WCHAR)CoTaskMemAlloc( memBufferSizeBytes );

    Should have been:

    awcBuffer = (WCHAR*)CoTaskMemAlloc( memBufferSizeBytes );

    Fixing this enabled the Content Indexer CI with Cidaemon.exe to start processing my files. I wondered why cidaemon from CI would show all these errors where WDS seems to work fine. I assume WDS is a bit friendlier, hiding the errors a bit. If this actually is the case, then working with cidaemon is not such a bad idea – since it forces me to really fix all the problems.

    I learned it quickly ran out of resources. So I started to reuse the memory buffer between GetText calls. This worked pretty well. Before I assumed the calling process needed to free this memory space.

    Today I ran into an other type of cidaemon error, where just a few DWG files were failing.

    Some dll’s from the dwg reading library were required for only a few specific DWG functions. Adding these dll’s helped solve this issue. In WDS I did not notice these files were skipped; good thing CI was showing me the error in my face…

    One CI specific thing is, I had to limit the text output of the filter to avoid an other cidaemon error. I added some counting/tracking code to my IFilter, added logging just before the point of faillure.

    The allowed size of text seems to depend on the CI registry settings, I believe default (my system is running XP) 3000 wchar chars is just safe. In my case this is the total text as emitted in the first IFilter chunk; where I have only one chunk emitting text. I am not sure if using more GetChunk chunks would allow for more text.

    To limit the output for cidaemon only, without affecting WDS, WSS, MOSS, I have added a check to see in which deamon process the code is running. To do so I added a check similar to this code (inspired by //

    #include <psapi.h>

    #include <Winbase.h>

     DWORD MyProcessID;

     MyProcessID = GetCurrentProcessId();

     TCHAR szProcessName[MAX_PATH] = TEXT("<unknown>");

     // Get a handle to the process.


     // Get the process name.

     if (NULL != hProcess )


         HMODULE hMod;

         DWORD cbNeeded;

         if ( EnumProcessModules( hProcess, &hMod, sizeof(hMod), &cbNeeded) )


               GetModuleBaseName( hProcess, hMod, szProcessName, sizeof(szProcessName)/sizeof(TCHAR) );



       // Print the process name and identifier.

    retVal= _bstr_t( szProcessName); // + _bstr_t( processID );

       CloseHandle( hProcess );

    Another thing I did earlier in the process, is limiting the text emitted from each GetText. Actually I believe it was a WDS thing that inspired me to do so. I have limited this to 1020 wchar chars.

    I am still struggeling with some errors in cidaemon. I may have a buildup of memory that is not beeing freed.

    On the start of the project I learned the IFilter is called into memory whenever it is needed – and I just assumed it is unloaded from memory after each file is processed, releasing and freeing all resources. This is not the case, it can stay in memory to process many files. So I now believe it can easily run out of resources when you do not free your resources as you should.

    I have to say having the help of the engineer at Microsoft meant a great deal in resolving most of the issues I have described here. So, thank you C.R., without your help I would not have gotten this far!

  42. Charan says:

    Hi ,

    I have some problem  extracting text from a Power point File (.ppt)

    The PPT file has a "Organization Chart"  and each node in the chart has some label .

    i dont find  that label in the Text i extrated from the PPT  file through iFilters .

    I tried to do a "Find" with that label in the Power Point and i am able to see the hit of that label in the file .

    (even the "WORD ART" text is not  returned by the iFilter , but this is fine since "Find" in POWER POINT doesnt search for such text )

    i think this a limitation of  iFilter ( for power point )  to return that kind of text .

    Am  i right on this ??

    if yes , then where can  i find a detailed note on the limitations of office iFilter ( offfilt.dll )



  43. Charan says:

    Hi Marco van Schagen ,

    I didnt have much knowledge on using iFilter Till today evening 🙂 …  

    I thought that i need to specify the Office iFilter DLL to the LoadIFilter routine !! ( which is wrong .. of course )

    so thats the reason why u found  that in my earlier code …  after some research i figured out how things work .. and finally  able to build a small working App that uses iFilter routines to extract text from some standard file formats ( .doc .pdf etc ) .

    I see that u are interested to see code sample which uses the iFilter routines …. the code i  have written may help you …

    My appologies if i am not supposed to post big message 🙂

    void *myIFilter = NULL ;

    IUnknown *myIUnknown = NULL ;

    HRESULT  rc = 0;

    if(!(CoInitialize(NULL) == S_OK ))


    return false ;


    if((LoadIFilter(L"C:\Temp\ab.doc" , myIUnknown , &myIFilter ) == S_OK )





    ULONG initOutflags =0;

    if(((IFilter *)myIFilter)->Init(initFlags , 0 , NULL , &initOutflags) == S_OK)



    memset(&mySC , 0 , sizeof(mySC) );



    long rcGC = ((IFilter *)myIFilter)->GetChunk(&mySC) ;

    if( rcGC == S_OK )


    if(mySC.flags == CHUNK_TEXT )


    WCHAR myBuffer[512];

    char buf[512];

    int i=0;

    ULONG myBufLen = 0;  

    FILE *fp = NULL;

    fp = fopen("c:\temp\output.txt" , "a");



    myBufLen = sizeof(myBuffer)/sizeof(myBuffer[0]);

    long rc = ((IFilter *)myIFilter)->GetText(&myBufLen , myBuffer);

    //For English Text – we dont need wchar

    for(i=0 ; i<myBufLen ; i++)

    buf[i] = (char)myBuffer[i];



    case S_OK : fwrite( buf, sizeof(buf[0]) , myBufLen , fp);


    case FILTER_E_NO_TEXT:


    case FILTER_S_LAST_TEXT: printf(" END !"); rc  = -1; break;

    default : printf("Unknown ERROR "); rc = -1;


    if(rc == -1)







    printf("we have a Value-type property  here ");



    else if(rcGC == FILTER_E_END_OF_CHUNKS )

    { printf("End of Chunks "); break; }


    { printf("EMbedding problem"); break; }

    else if( rcGC == FILTER_E_LINK_UNAVAILABLE )

    {printf("Link Problem"); break; }

    else if( rcGC == FILTER_E_ACCESS )

    {printf("Access Problem"); break; }


    { printf("Unknown Probblem "); break; }

    }// while(1)



  44. Charan says:

    NOtE : i dont handle the value-property in the above code …. it takes lot of time implement the same .  I stil need to figure out how to handle the value-property .


  45. Deb Haldar says:

    Marco, its great news that you’re able to drill down to the bottom of this. Regarding

    " In my case this is the total text as emitted in the first IFilter chunk; where I have only one chunk emitting text. I am not sure if using more GetChunk chunks would allow for more text.


    Actually if you’re having problems filtering large files, using more chunks is an excellent idea.

    Also a heads up, the next version of windows will put a security cookie to prevent use of temp files. You might want to start a dialogue with the AutoCad folks to provide stream based loading functionality in their library used by the filter.

  46. Deb Haldar says:


    There is no official document detailing the limitations of offilt. We look into each issue based on its merit and resolve/fix it acoordingly.

    Can you please send me the repro file with a crisp description of the problem?



  47. Charan says:

    Deb ,

    i will send the file on which i found the problem with other details .

    may i have the mail ID of the person to whom i should send the details ?



  48. Deb Haldar says:


    Ideally you’d send the file to the Microsoft Product Support contact for your company.

    However, if you do not have a MS PSS contact, please send them to and



  49. Marcovanschagen says:

    My apologies, my previous post was based on a bad piece of code. I have made many changes, and the errors disappeared. Good thing. Also the indexing result dissapeared. Not a good thing.

    I was thinking, having one buffer between GetText calls may result the indexer to read the last text from buffer over and over again. I’ll need to check this.

    My Microsoft contact tells me I am having an issue with stack space. I am still working on this one.

  50. I’m pleased to announce that after a lot of trails and tribulations, the DWG IFilter is finally ready.

  51. Albert says:

    Hi there,

    Thanks for all the great tips!

    For windows desktop search, do you know how the text in the preview pane is derived?  

    For me it looks like the contents of the first call to getText() is shown in the preview.

  52. Charan says:

    I have one doubt with IFilter …

    is it possible to ask iFilter to give the textual contents in some defined order ?

    meaning that, for eg , if you take ppt file .. the client should able to distinguish between the text retrieved from a header and from a page.

    Are there any way to specify these things to iFilter so that i know which section of the document i am dealing with and handle those block of text accordingly ?

  53. Sohaib says:

    Hi Charan

    Its a great new that u figured out this tricky thing – Although its a long time but still congrats 🙂

  54. Gabriella says:

    Anyone  knows if exists an I filter for ISO files?

    Thanks a lot,


  55. Jerry Camel says:

    I have a bit of an IFilter mystery to solve.  I’m writing a custom IFilter for a file format that might contain some embedded MIME stuff…  How can I get a handle to the MIME IFilter so that I can pass the chunk requests through?  It seems everything is setup to easily deal with OLE structured storage docs, but not MIME.  If I pass the stream off with BindIFilterFromStream, I just get a message stating that the OLESTREAM format is incorrect.  The data is in a stream, not a file…