FOXIT vs. Adobe PDF IFilter [ 32-bit only ]
Sometimes back I had the chance to run a performance and international sufficiency analysis on the Adobe and FOXIT ifilters for some of our customers. The following report is now made available for a broader audience.
PERFORMANCE ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER
Machine : Intel Xeon CPU @ 1.4 GHz (4 hyperthreaded processors)
4.00 GB of RAM
32-bit Win2K3 SP1
Indexer performance set to partly reduced.
FOXIT v1.0 |
ADOBE v.8 |
|
Total # of pdf documents |
10917 |
10917 |
# successful crawls |
10871 |
10909 |
# errors |
44 (expired ebooks etc) |
0 |
# warnings |
2 (corrupted doc) |
2 (corrupted doc) |
CRAWL TIME: |
||
Portal Content |
00:49:21.163 |
03:34:39.237 |
Anchor Crawl 1 |
00:02:03.527 |
00:02:39.073 |
Anchor Crawl 2 |
00:00:02.173 |
00:00:02.437 |
TOTAL Crawl Time |
00:51:26.863 (~ 51 minutes) |
03:38:00.747 (~ 218 minutes) |
Analysis:
1. The FOXIT filter is 4.27 times faster than the Adobe filter on a quad proc machine. This is expected since the adobe filter is not truly multithreaded and serialized the threads.
2. The Adobe filter crawls some documents which ideally should not be crawled (expired ebooks etc).
INTL SUFFICIENCY ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER
Both the adobe and FOXIT filters do not return the correct locale for non-english documents. Both of them always emits LOCALE = 1033 (en-us).Hence we pass them to the neutral wordbreaker and this compromises search relevance.
Tests were performed on JPN, CHS, FRE and HEB pdf documents using both the indexer and standalone test tools.
Language |
# Tokens |
MOSS returns result with FOXIT ? |
MOSS returns result with Adobe? |
Correct locale emitted by FOXIT? |
Correct locale emitted by Adobe? |
JPN |
2 |
No |
No |
No |
No |
CHS |
2 |
No |
No |
No |
No |
FRE |
2 |
Yes |
Yes |
No |
No |
HEB |
2 |
Yes |
Yes |
No |
No |
Note that since French is syntactically very close to English, we still get back valid results. In case of the Hebrew documents, I’d say it’s a matter of coincidence that the token the language expert gave me was correctly wordbroken.