Lab Bonus! Enhanced Sentiment Analysis for Twitter from Microsoft Research

Article
02/02/2012

What’s new?

In this release, we enhanced our sentiment software by upgrading to the latest version of code from the same Microsoft Research team we used in prior releases. The major improvement in this version of their code is a new classifier specifically trained on Tweets. The sentiment analysis code we used in prior releases from Microsoft Research was trained on short sentences and paragraphs. We predict that the accuracy of sentiment analysis will improve in Social Analytics by using the classifier trained specifically on tweets for Twitter content items. We will continue to use the sentence and paragraph classifiers on all other content.

The tweet classifier was trained on nearly 4 million tweets from over a year’s worth of English Twitter data. It is based on a study of how people express their moods on Twitter with mood-indicating hashtags. We mapped over 150 different mood-bearing hashtags to positive and negative affect, and used the hashtags as a training signal to learn which words and word pairs in a tweet are highly correlated with positive or negative affect.

How we use the sentiment software in Social Analytics

We use the MSR sentiment software to assess the tone of all content items as part of our enrichment process. When the assessment is complete we store both that ranking and the reliability of that assessment the these 2 fields respectively; CalculatedToneID and ToneReliability. Our API and sample client Silverlight UI will expose content item as either positive or negative if the sentiment engine scores the item with a reliability percentage over a certain threshold we determine.

Here is a simple explanation of the three fields related to sentiment in the ContentItem table:

Field	Description
CalculatedToneID	The sentiment (or tone) of the content item as determined by the sentiment software: 3 = Neutral 5 = Positive 6 =Negative
ToneReliability	The reliability of the tone calculation as determined by the sentiment software. The reliability thresholds are currently 80% for positive sentiment and 90% for negative sentiment. If we’re below the reliability threshold, the CalculatedToneId will be set to neutral.
ToneID	If a user sets the sentiment manually in our UI or thru an API call, the tone they set is stored in this field. If ToneID is set, we show ToneID rather than CalculatedToneID in the UI and return it in API calls.

For more details on this Microsoft Research project, check out https://research.microsoft.com/tweetaffect !

Lab Bonus! Enhanced Sentiment Analysis for Twitter from Microsoft Research

Additional resources