UPDATE: Serving New Models

Today’s post was delayed slightly but we have good news — announcing the availability of additional language model datasets.  As always, the easiest way to get a list is to simply navigate to http://web-ngram.research.microsoft.com/rest/lookup.svc.  Shown below are the new items, in URN form: urn:ngram:bing-title:apr10:1 urn:ngram:bing-title:apr10:2 urn:ngram:bing-title:apr10:3 urn:ngram:bing-title:apr10:4 urn:ngram:bing-title:apr10:5 urn:ngram:bing-anchor:apr10:1 urn:ngram:bing-anchor:apr10:2 urn:ngram:bing-anchor:apr10:3 urn:ngram:bing-anchor:apr10:4 urn:ngram:bing-anchor:apr10:5 urn:ngram:bing-body:apr10:1 urn:ngram:bing-body:apr10:2…

0

Generative-Mode API

In previous posts I wrote how the Web N-Gram service answers the question: what is the probability of word w in the context c?  This is useful, but sometimes you want to know: what are some words {w} that could follow the context c?  This is where the Generative-Mode APIs come in to play. Examples…

1

Language Modeling 102

In last week’s post, we covered the basics of conditional probabilities in language modeling.   Let’s now have another quick math lesson on joint probabilities. A joint probability is useful when you’re interested in the probability of an entire sequence of words.  Here I can borrow an equation from Wikipedia: The middle term is the…

0

Language Modeling 101

The Microsoft Web N-Gram service, at its core, is a data service that returns conditional probabilities of words given a context.  But what does that exactly mean?  Let me explain. Conditional probability is usually expressed with a vertical bar: P(w|c).  In plain English you would say: what is the probability of w given c?  In…

0