Today’s post was delayed slightly but we have good news — announcing the availability of additional language model datasets. As always, the easiest way to get a list is to simply navigate to http://web-ngram.research.microsoft.com/rest/lookup.svc. Shown below are the new items, in URN form: urn:ngram:bing-title:apr10:1 urn:ngram:bing-title:apr10:2 urn:ngram:bing-title:apr10:3 urn:ngram:bing-title:apr10:4 urn:ngram:bing-title:apr10:5 urn:ngram:bing-anchor:apr10:1 urn:ngram:bing-anchor:apr10:2 urn:ngram:bing-anchor:apr10:3 urn:ngram:bing-anchor:apr10:4 urn:ngram:bing-anchor:apr10:5 urn:ngram:bing-body:apr10:1 urn:ngram:bing-body:apr10:2…

# Month: October 2010

## Generative-Mode API

In previous posts I wrote how the Web N-Gram service answers the question: what is the probability of word w in the context c? This is useful, but sometimes you want to know: what are some words {w} that could follow the context c? This is where the Generative-Mode APIs come in to play. Examples…

## Language Modeling 102

In last week’s post, we covered the basics of conditional probabilities in language modeling. Let’s now have another quick math lesson on joint probabilities. A joint probability is useful when you’re interested in the probability of an entire sequence of words. Here I can borrow an equation from Wikipedia: The middle term is the…

## Language Modeling 101

The Microsoft Web N-Gram service, at its core, is a data service that returns conditional probabilities of words given a context. But what does that exactly mean? Let me explain. Conditional probability is usually expressed with a vertical bar: P(w|c). In plain English you would say: what is the probability of w given c? In…