The Microsoft Office 2003 French spell-checker vs. the OpenOffice speller

 

A few weeks ago, Professor Jean Véronis, from the French University of Aix-en-Provence, carried out a comparison of the new Microsoft Office French spell-checker and the French speller included in the Google Toolbar (see here for a summary in English). Last week, he carried out a similar experiment to compare our new MS French speller and grammar checker and the OpenOffice speller. Of course, his experiment is based upon a very small corpus of data (one newspaper article into which he automatically injected spelling mistakes). It would certainly also be nice to see how the two systems behave when dealing with highly edited texts and, of course, counting the number of good and false flags only reveals part of the truth, since OpenOffice does not offer any grammar checking function, for instance, while MS Office does (in doing so, the latter catches more errors, but also, once in a while, erroneously flags a correct construction). There are other differences which the small text used in Prof Véronis’ experiment does not bring to light (he mentions the French spelling reform, which the OO speller does not take into account - the treatment of feminine job titles should also be added to this, since they are two crucial features of the new Microsoft speller). Anyway, it's nice to read that he considers we have a strong conceptual lead (especially in the field of grammar checking, where he points out we have improved things noticeably) and that the global metrics show the MS speller is better than the OpenOffice one. Prof. Véronis’ blog being in French, I thought it would be nice to offer an English version for the benefit of all those who are interested in proofing tools but cannot read the original version of the blog.

Thierry Fontenelle [MSFT]
Microsoft Speech & Natural Language

Prof. Jean Véronis’ original blog in French can be found here.

Ortograf: OpenOffice vs Microsoft

(Prof Jean Véronis, Aix-en-Provence)

The recent launch of the French version of OpenOffice a few days ago incited me to expand my comparison of spell-checkers. In a previous study [here], I had compared the Microsoft Word speller with the spell-checking function in the Google Toolbar. The advantage was clearly in favor of MS Word (with the patch – new version - which it is important to download), this advantage being mainly due to a good treatment of proper names and, to a lesser extent, to its grammar checking functionalities (in particular for agreement problems). We are going to see that, with OpenOffice, the match is tighter.

I kept the same text for the evaluation (an article from the newspaper Le Monde which I submitted to my "spell-wrecker" : here). Here are the results : (the noise refers to false flags and silence refers to missed flags, i.e. mistakes that have not been spotted):

%

Noise

Silence

MSWord (with Patch)

1,7

21,3

OpenOffice

0,0

25,3

Google

1,7

24,0

Without proper names or foreign words

%

Noise

Silence

MSWord (with Patch)

9,3

20,0

OpenOffice

6,0

27,6

Google

34,7

22,4

With proper names and foreign words

If we don’t take into account proper names and foreign words used in the text, OpenOffice has slightly fewer false flags than MSWord, but fails to spot more mistakes. The tendency is the same if one takes into account proper names and English words: a little less noise and more silence. It is important to use the correct settings to make sure that OpenOffice uses the option "detect all languages": language detection seems to work fine, at least on my example. The sentence "Do you like roast-beef?", cited in English in the text, is identified as an English sentence by OpenOffice, while it is not in MSWord. (One has to add that language detection on so small fragments is a very delicate process).

As we can see, the results are very close. On the whole, there is a slight advantage for MSWord when compared to OpenOffice (Google is far behind). I don’t want to be too technical here, but it can be measured more precisely if one penalizes noise and silence in the same way, using F-measure (which smoothes out precision and recall) : this F-measure is slightly higher in both cases for MSWord (87,4% vs. 85,4% in the first case; 85,0% vs. 81,8% in the second case).

The performance of the OpenOffice speller is not too bad if one takes into account its open nature and the more limited means they have for their development. Of course, more in-depth analyses should be carried out on a larger scale, with other types of texts and my experience has only an indicative value. However, I think that the developers of the French version of OpenOffice should be vigilant. Microsoft has obviously reopened the development of their French proofing tools, with a very competent team. On some fronts, its conceptual lead is very strong, even if these figures do not yet show it very much. Grammar checking is a case in point here, a field in which, as I pointed out the other day here, Microsoft is now noticeably improving things. Let us also point out that OpenOffice does not yet integrate the spelling reform which (since 1990...) has been recommended by the Conseil Supérieur de la Langue Française and the Académie Française, for words such as règlementaire, révolver, ambigüe, etc. (but it would be fairly easy to integrate it).