PDF iFilter Battle, second round


If you still remember the last round of our PDF iFilter battle, FoxIT won it. Now in this round, we bring in another challenger: TET PDF iFIlter. It is also avaliable on x86 and x64, free for non-commercial desktop use, will need a license for Server installation.

So here's the new result for file set II:

 

File Number

Total File Size(MB)

Avg File Size(MB)

Crawl Time(m:s)

Crawl Time(s)

File Per Second

Success

Error

FoxIT

2676

2406

0.90

7:46

466

5.74

2759

0

Adobe

2676

2406

0.90

40:58

2458

1.09

2757

2

TET

2676

2406

0.90

13:48

828

3.23

2752

0

 

I also obtained an archive copy from People's Daily, from 2001 to 2006. ~20,000 PDF files, 13.4GB total. Tested on a 8 cores XEON box.

 

 

File Number

Total File Size(MB)

Avg File Size(MB)

Crawl Time(h:m:s)

Crawl Time(s)

File Per Second

Success

Error

FoxIT

19890

13793

0.69

00:30:53

1853

10.73

19884

7

Adobe

19890

13793

0.69

05:19:04

19144

1.03

19887

4

TET

19890

13793

0.69

01:40:09

6009

3.31

19879

12

 

And licensing comparsion for production(USD):

  Desktop Server 1-2 Cores
Per Server
4 Cores
Per Server
8+ Cores Per Server
Adobe Free Free Free Free Free
Foxit Free Not Free 329.99 589.97 1109.93
TET $119 for commercial usage Not Free 595 595 595

 

Summary

It is good to see another vendor joined this market. TET showed good performance, although still behind Foxit. But it's licensed based on servers not cores, the cost would be lower than Foxit if you have a typical 2 way quad cores box.

Comments (3)

  1. cy21 says:

    Great post.

    What are the errors that were encountered? FoxIT shows 7, Adobe shows 4, and TET shows 12. Are they true errors or are they notices for items that are correctly not crawled, such as expired items, items marked as not to be crawled, password protected, etc…?

    I think this would be a large factor when considering which iFilter to use.  One may consider a slower rate of indexing to be acceptable if a larger percentage of the corpus will be properly indexed.

  2. brian says:

    cy21 raises a good point, I'm currently looking at which is best, I think adobe fell off the short list pretty quickly, but TET might be worth considering if there is a trade-off between quality and speed, the numbers in this post seem to show that Foxxit is slightly more reliable in addition to being much faster than TET, is this a logical conclusion or are there other factors involved in the reliability of ifilters? (you seem to be following this technology very closely and I had not known ifilters existed until today)

Skip to main content