Word count of DNWL

I thought it could have been interesting to see the word occurrencies in the various blogs of DNWL, just to seek further confirmation of the geeky attitude of the community 🙂

So I just implemented a quick DictionaryTree and I scanned the SpecialFolder26\SharpReader\Cache (leaving in it DNWL files only). For the sake of simplicity, I took in consideration only Title and Description InnerTexts and I stripped all HTML tags.

The results are interesting, and definitely funny. You can find the complete list of terms plus occurrencies (over 14.500 terms, including some aberration in the end produced by the homebrewed parser) here.

Fun facts:

1) the term NET occurs 1700 times, between two powerful buzz words (ON and MY)
2) CODE and WEB counts 561 and 535 respectively, between THEY and WOULD
3) DON, not surprisingly, is the most used name with 454 occurrences: and I have the slight impression that that doesn't match with the stats about the most common English name 🙂
4) the term BLOGS pops up 444 times
5) MICROSOFT wins the most referenced company contest, with 369;
6) C# is the most quoted language, with 273 entries (between 2003 and SHOULD)
7) SCOTT seems another extremely common name, with 168 entries

OK, ok. The system I used is FAR from being perfect, I should have preserved the "." in front of "NET" and so on, but since the frequencies of the "normal" words are meaningful I believe that the analisys has some significance. In few weeks I'll repeat the process, just to see if it will give insight on trends.



Comments (0)

Skip to main content