Building corpora with the Live Search API

I just read Building and Exploring Web Corpora, which includes the Proceedings of the 3rd Web as Corpus Workshop (WAC3-2007) held at the University of Louvain-la-Neuve in September 2007. A number of papers describe how computational linguists have been using Microsoft’s Live Search Application Programming Interface (API) to build and clean corpora to be used…


Untied Nations or United Nations?

During my vacation in December 2007, I had a chance to visit a friend of mine who works for the United Nations in Bangkok. On a Friday evening right before Secretary-General Ban Ki-moon’s visit to the UN Bangkok office, I chatted with his colleagues in the UN building over beer and wine. Many of them…


MSR blog on the Microsoft Research Machine Translation system

Our colleagues from the Microsoft Research (MSR) group have started blogging about the statistical machine translation (MSR-MT) system they are developing. We announced the Windows Live Translator when it was launched in September. Check out their blog if you want to know all the details about this system. For instance, you will discover how to…


New Blog on Enterprise Search

There’s a new blog out there from the team that’s working on Enterprise Search for MOSS (Microsoft Office SharePoint Server). They’ve got tips and tricks for administrators and will be posting info on features. It’s a team that we work closely with, delivering query spelling suggestions and tokenization with morphological analysis. Some of the most recent news…


The French spelling reform in the Canadian press

  For readers who are interested in the French spelling reform, two very recent articles published in Canadian newspapers in Montreal a few days ago discuss the penetration of the spelling reform, its slow but increasing adoption by teachers and the press, in Canada, Belgium, Switzerland and France. Both articles, which quote Chantal Contant from…


Contextual spelling: US English only?

Laurie asked us via the Email/Contact link: I was always under the impression that the Contextual Spell Checker only works if your language is set to English (US) rather than English (UK). However, I have recently seen the blue squiggly lines appear for English (UK).Can you confirm whether this has come about as a result…


When Languages Die

James was talking about endangered languages the other day. I have just finished reading David Harrison’s new book on “When Languages Die – The Extinction of the World’s Languages and the Erosion of Human Knowledge”, which I discovered via Michael Kaplan’s blog. It’s a fascinating account of language disappearance, which takes place because thousands of…


Fellow linguist blogger in Windows International

Kieran is a fellow linguist on the Windows International team, working closely with the team delivering Windows Desktop Search. She’s got some great insight into language and technology on her “Loneliness of the Long Distance Linguist” blog. Check her out here: Linguists, we are everywhere! 🙂   — Jay Waltmunson (Program Manager)


Japanese Word Breaking in Windows Desktop Search

Jonas Barklund is a veteran developer working on Windows Desktop Search. He’s got a great post with details on how Windows Search works in Japanese, using our Natural Language Group word breaker. Check it out: — Jay Waltmunson (Program Manager)


Smiley is 25

This week marked the 25th birthday of the smiley :-).  But are emoticons really only 25?  Language Log has some history. — James Lyle (Test Lead)