UPDATE: Serving New Models

Article
10/25/2010

Today's post was delayed slightly but we have good news — announcing the availability of additional language model datasets. As always, the easiest way to get a list is to simply navigate to https://web-ngram.research.microsoft.com/rest/lookup.svc. Shown below are the new items, in URN form:

 urn:ngram:bing-title:apr10:1
urn:ngram:bing-title:apr10:2
urn:ngram:bing-title:apr10:3
urn:ngram:bing-title:apr10:4
urn:ngram:bing-title:apr10:5
urn:ngram:bing-anchor:apr10:1
urn:ngram:bing-anchor:apr10:2
urn:ngram:bing-anchor:apr10:3
urn:ngram:bing-anchor:apr10:4
urn:ngram:bing-anchor:apr10:5
urn:ngram:bing-body:apr10:1
urn:ngram:bing-body:apr10:2
urn:ngram:bing-body:apr10:3
urn:ngram:bing-body:apr10:4
urn:ngram:bing-body:apr10:5

For those of you familiar with the naming scheme will notice right away that we're now supporting 5-grams for the three main streams. What's not captured in the naming scheme is that unlike the jun09 dataset for the body stream, the apr10 dataset has a cutoff of 10. The title and anchor stream still have a cutoff of 0, as did all of the jun09 streams.

UPDATE: Serving New Models

Additional resources