Ask Learn
Preview
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Have you ever wondered if a book that you are about to read is pretty positive or it is a bit of a depressing downer sort of book? Are you, like me, too lazy to actually read it to find out? Well, look no further. I have made a very simple e-book sentiment analyzer using Azure Cognitive Services. This will give the overall sentiment of an e-book and also give you the sentiment for each chapter in case you want to jump straight to the good parts.
To follow this tutorial, you will need:
You can obtain a Cognitive Services API key by:
You can find more details on that here. Alternatively, you can get a free trial API key.
There are many different e-book formats out there. In this demo, I have chosen to use the EPUB, which is supported by many browsers and e-readers. You can find books in this format (many of them free, public domain) online. The books I have used were found on https://www.feedbooks.com.
I have written my e-book analyzer in C# (.NET), but the Cognitive Services API is really a simple REST interface that you should be able to call from pretty much any application or language. In order to parse the EPUB files, I have used the VerseOne.Epub library. The code for that library is on GitHub.
You can find the complete source code for the e-book sentiment analyzer on GitHub. It is a pretty simple, single source file application. In fact, it is so short and simple that I will just reproduce the source code here:
using System;
using System.Text;
using HtmlAgilityPack;
using VersOne.Epub;
using Microsoft.ProjectOxford.Text.Core.Exceptions;
using Microsoft.ProjectOxford.Text.Sentiment;
namespace EpubSentiment
{
class Program
{
static void AppendChapter(ref SentimentRequest request, EpubChapter chapter)
{
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(chapter.HtmlContent);
StringBuilder sb = new StringBuilder();
foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//text()"))
{
sb.AppendLine(node.InnerText.Trim());
}
string chapterText = sb.ToString();
int maxCharacters = 3 * 1024; //Max characters that we will send to sentiment API
int chunks = (int)Math.Ceiling((double)chapterText.Length / (double)maxCharacters);
int charsPerChunk = (int)Math.Ceiling((double)chapterText.Length / (double)chunks);
int offset = 0;
for (int i = 0; i < chunks; ++i)
{
if (offset + charsPerChunk > chapterText.Length)
{
charsPerChunk = chapterText.Length - offset;
}
var testText = chapterText.Substring(offset, charsPerChunk);
string chunkID = "CHUNKDOCUMENT" + i;
var doc = new SentimentDocument() { Id = chunkID, Text = testText, Language = "en" };
request.Documents.Add(doc);
offset += charsPerChunk;
}
}
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: ");
Console.WriteLine(" " + System.AppDomain.CurrentDomain.FriendlyName + " <FILENAME> <APIKEY>");
Environment.Exit(1);
}
string bookfile = args[0];
string apiKey = args[1];
Console.WriteLine("Analyzing book: " + bookfile);
EpubBook epubBook = EpubReader.ReadBook(bookfile);
string title = epubBook.Title;
string author = epubBook.Author;
Console.WriteLine("Book title: " + title);
Console.WriteLine();
double bookScore = 0.0;
int numChapters = 0;
foreach (EpubChapter chapter in epubBook.Chapters)
{
var request = new SentimentRequest();
string chapterTitle = chapter.Title;
AppendChapter(ref request, chapter);
foreach (EpubChapter subChapter in chapter.SubChapters)
{
AppendChapter(ref request, subChapter);
}
var client = new SentimentClient(apiKey);
var response = client.GetSentiment(request);
foreach (Microsoft.ProjectOxford.Text.Core.DocumentError e in response.Errors)
{
Console.WriteLine("Errors: " + e.Message);
}
double score = 0.0;
int numScores = 0;
foreach (SentimentDocumentResult r in response.Documents)
{
score += r.Score;
numScores++;
}
score /= numScores;
Console.WriteLine(numChapters + ": " + chapterTitle + ", score: " + score);
bookScore += score;
numChapters++;
}
bookScore /= numChapters;
Console.WriteLine();
Console.WriteLine("Average book sentiment: " + bookScore);
}
}
}
This code does a few different things:
The code should be self-explanatory, but a few comments may be in order.
Firstly, the idea of chopping a chapter into chunks and calculating the chapter sentiment based on an average of the sentiment for each chunk is probably not mathematically or statistically all that sound. Specifically, the sentiment is probably not linear, so one could imagine variations depending on how the chapter is chopped and so on. Moreover, this problem is not likely to "average out" over many chapters or books. It is beyond the scope of this little tutorial to go into details on this, but one could actually use this tool to investigate further by varying the chunk sizes, etc. An additional comment on this is that I have somewhat arbitrarily chosen 3k characters as the chunk size. This choice was not based on any rigorous analysis, it was based on having a size that was small enough to fit within the limits of the Cognitive Services API (10KB of data) while being large enough that I don't make too many calls to the API (thus incurring large costs). It is easy to play with these settings in the application.
I am using the .NET API for Cognitive Services in this example. A different way to do this is through the REST API, which would be more generic and probably make it easier for people to port this code to other languages, but the .NET API provided an easy way to make this a very compact code example.
I make no attempt to deal with books in other languages than English. The Cognitive Services API could actually be used to detect the language and get the sentiment for the appropriate language or translate before calling the sentiment API. Again, that would have made for a more elaborate example and in the interest of brevity, this example only works for books in English.
So now that we have a sentiment analyzer, let's take it for a spin. I have chosen "A Christmas Carol" by Charles Dickens. It is available in public domain form. Running the analyzer on it would look something like this:
PS> dotnet.exe .\EpubSentiment.dll C:\temp\christmas_carol.epub <API KEY>
Analyzing book: C:\temp\christmas_carol.epub
Book title: A Christmas Carol
0: Title, score: 0.5
1: About, score: 0.999999523162842
2: Chapter 1 - Marley's Ghost, score: 0.228464378760411
3: Chapter 2 - The First Of The Three Spirits, score: 0.561749743918578
4: Chapter 3 - The Second Of The Three Spirits, score: 0.87196253426373
5: Chapter 4 - The Last Of The Spirits, score: 0.384314155578613
6: Chapter 5 - The End Of It, score: 0.999999988079071
Average book sentiment: 0.649498617680464
So we see that once you get past the "About", it is actually a bit of a downer, with the exception of Chapter 3, which is mostly positive (in sentiment). Chapter 5 (The End Of It) is very positive. This is pretty much how I remember that book, so it makes sense.
Obviously the sentiment of the text and the actual feel and message of the book may not be the same. One could imagine some pretty negative language in a book that is ultimately inspiring and uplifting, and vice versa, but the sentiment analysis provides one type of data point on the sentiment of the book.
Give it a try on some of your favorite books and let me know what you find.
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign in