Sentiment Analysis of US Presidential Inaugural Addresses

In this blog post, I will take a look at how we can use Azure Cognitive Services Text Analytics API to analyze speeches in terms of their sentiment. This is continuation of my previous explorations of the Text Analytics API, I made an e-book sentiment analyzer, which I encourage you to read to get more details on how to use the Text Analytics API. I will be analyzing all US Presidential inaugural addresses from George Washington up through Donald J. Trump. Keep reading, the results may surprise you.

I am not trying to make any political statements with this analysis. It is simply an example of using Cognitive Services to look at a collection of speeches.  

The source of the data used in this analysis can be found on the American Presidency Project website. I will illustrate with some code how one can extract the actual text of each inaugural address. The source code is available on GitHub, and the most important pieces will be reproduced in the blog post.

TL;DR - Just Give Me The Results

I know that some readers will just be here for the results and will not care much about the implementation, so I will spare them the agony of reading through the entire post. Here are the results:

Summary statistics:

Mean 0.82
Median 0.84
Standard Deviation 0.15

 

The 10 most positive (in terms of sentiment) inaugural addresses:

President Date Sentiment
1 Thomas Jefferson March 4, 1801 1.00
2 William Henry Harrison March 4, 1841 1.00
3 Theodore Roosevelt March 4, 1905 1.00
4 George Washington April 30, 1789 1.00
5 Franklin Pierce March 4, 1853 0.99
6 James Monroe March 4, 1817 0.98
7 Zachary Taylor March 5, 1849 0.98
8 Woodrow Wilson March 4, 1913 0.98
9 Ulysses S. Grant March 4, 1869 0.98
10 Grover Cleveland - I March 4, 1885 0.97

 

The 10 most negative (in terms of sentiment) inaugural addresses:

President Date Sentiment
1 George Washington March 4, 1793 0.42
2 James Madison March 4, 1813 0.44
3 Abraham Lincoln March 4, 1865 0.49
4 Lyndon B. Johnson January 20, 1965 0.49
5 John F. Kennedy January 20, 1961 0.53
6 Barack Obama January 20, 2009 0.57
7 Thomas Jefferson March 4, 1805 0.65
8 George W. Bush January 20, 2001 0.67
9 Andrew Jackson March 4, 1833 0.70
10 Franklin D. Roosevelt January 20, 1945 0.71

 

One observation is that Donald J. Trumps recent "American carnage" inaugural address is not on either top 10 list. It has a score of 0.72, which is a bit below average, but not much so. This is also in agreement with other analyses of that address, that have found it to be relatively positive. My analysis shows it to be a bit below average for inaugural addresses, but in the broader context of speeches, 0.72 is still relatively positive. You can find it on the complete list of scores at the end of this blog. Barack Obama's first inaugural address and John F. Kennedy's inaugural address, are both in the top 10 of most negative sentiment, which may surprise some. However, if you read those addresses, I think you may agree that while they may be great speeches (this blogger does not presume to have an opinion about that), they also talk about some serious problems and challenges, which of course will influence the language. It just shows that a speech can be inspirational (or aspirational) without necessarily having a positive sentiment. Obama's second address was much more positive than the first.

Another observation is that there are more modern presidents on the negative top 10 list, which is also visible if we plot the sentiment as function of year:

 

 

It would appear that there is a slight downward trend in sentiment, which may just be a reflection of language changes over the years. We would need some more data from other types of speeches to dig into that. I will leave you to figure out when the addresses will start to be real downers if the current trend continues.

I will also leave the political analysis to others. The purpose of this is just to illustrate that one can get interesting trends and information with relatively little work using Cognitive Services.

Implementation - The Gory Details

So if you would like to do this yourself, it is actually relatively easy. I have written a utility C# library that use can use to analyze speeches or other text documents. The library is called "SpielInsights", since calling it SpeechInsights might confuse it with something that analysis spoken words (as in audio). The library will give you the sentiment of each paragraph of text and also the key phrases in each paragraph. It will also provide you with summary information for the entire document. In this blog post, I am really just reporting the overall sentiment of the document. This overall sentiment is calculated as a weighted average (by the relative length of each paragraph in characters) of the sentiment of all paragraphs in the text. You can find the source code for this utility library here.

In order to use the library, you will need to load your text into a "Spiel" object:

 

     public class Spiel
    {

        public Spiel()
        {
            Paragraphs = new List<string>();
        }

        public string SourceURI { get; set; }

        public string Speaker { get; set; }

        public string Category { get; set; }

        public DateTime Date { get; set; }

        public List<string> Paragraphs { get; set; }
    }

This contains a bit of information about the speaker and the time when the speech was given and a list of text paragraphs. This object can be passed to the AnalyzeSpiel function:

        static public SpielAnalytics AnalyzeSpiel(Spiel spiel, string apiKey) 
       { 
          ... 
       }

Which will make as many calls as needed (based on length of text) to the Text Analytics API and then return some analytics:

 

     public class SpielParagraphAnalytics
    {
        public SpielParagraphAnalytics()
        {
            KeyPhrases = new HashSet<string>();
            Sentiment = 0;
            Words = 0;
            Characters = 0;
        }

        public long Words { get; set; }

        public long Characters { get; set; }

        public double Sentiment { get; set; }

        public HashSet<string> KeyPhrases { get; set; }
    }

    public class SpielAnalytics
    {
        public SpielAnalytics()
        {
            SummaryAnalytics = new SpielParagraphAnalytics();
            ParaGraphAnalytics = new List<SpielParagraphAnalytics>();
        }

        public SpielParagraphAnalytics SummaryAnalytics { get; set; }
        public List<SpielParagraphAnalytics> ParaGraphAnalytics { get; set; }
    }

Now that we have the basic structures and functions laid out, we need the data. I will just spend a bit of time here to show how I did the retrieval and cleaning of the data. As mentioned, the inaugural addresses can be found on the American Presidency Project web site, however, there is no database of the raw text that I could find, so I have made a list of each of the speeches with the links to their specific pages. You can find that list here. An excerpt from the list looks like this:

 George Washington;https://www.presidency.ucsb.edu/ws/index.php?pid=25800;April 30, 1789
George Washington;https://www.presidency.ucsb.edu/ws/index.php?pid=25801;March 4, 1793
John Adams;https://www.presidency.ucsb.edu/ws/index.php?pid=25802;March 4, 1797
Thomas Jefferson;https://www.presidency.ucsb.edu/ws/index.php?pid=25803;March 4, 1801
Thomas Jefferson;https://www.presidency.ucsb.edu/ws/index.php?pid=25804;March 4, 1805
James Madison;https://www.presidency.ucsb.edu/ws/index.php?pid=25805;March 4, 1809
James Madison;https://www.presidency.ucsb.edu/ws/index.php?pid=25806;March 4, 1813
James Monroe;https://www.presidency.ucsb.edu/ws/index.php?pid=25807;March 4, 1817
James Monroe;https://www.presidency.ucsb.edu/ws/index.php?pid=25808;March 4, 1821

    ...

I have then written a small C# (.NET CORE) routine that runs through this list, finds the part of the HTML documents where the speech text is and puts each paragraph into the "Spiel" structure before sending it to the analysis routine. You can find the source code of this routine here, and since it is pretty short, I have reproduced it here:

 using System;
using System.IO;
using System.Net.Http;
using System.Threading.Tasks;
using HtmlAgilityPack;
using SpielInsights;
using System.Collections.Generic;
using System.Text;

namespace Inaugurals
{
    class Program
    {
        static void Main(string[] args)
        {
            string inputList = args[0];
            string apiKey = args[1];
            string outputFileName = args[2];

            var client = new HttpClient();

            //Output file
            System.IO.StreamWriter outputFile = new System.IO.StreamWriter(outputFileName);

            Task.Run(async () =>
            {
                using (StreamReader reader = new StreamReader(inputList))
                {
                    string line;
                    while ((line = reader.ReadLine()) != null)
                    {
                        Spiel spiel = new Spiel();

                        string[] components = line.Split(";");

                        spiel.Speaker = components[0];
                        spiel.SourceURI = components[1];
                        spiel.Date = System.Convert.ToDateTime(components[2]);

                        Console.WriteLine("Processing Inaugural: " + components[1]);
                        var response = await client.GetAsync(components[1]);
                        var content = await response.Content.ReadAsStringAsync();

                        HtmlDocument htmlDocument = new HtmlDocument();
                        htmlDocument.LoadHtml(content);

                        foreach (HtmlNode node in 
                                 htmlDocument.DocumentNode.SelectNodes("//span[@class='displaytext']"))
                        {
                            HtmlDocument innerHtmlDocument = new HtmlDocument();
                            innerHtmlDocument.LoadHtml(node.InnerHtml);

                            foreach (HtmlNode pnode in innerHtmlDocument.DocumentNode.SelectNodes("//text()"))
                            {
                                string paragraphText = pnode.InnerText.Trim();
                                spiel.Paragraphs.Add(paragraphText);                  
                            }

                            SpielAnalytics analytics = SpielInsights.SpielInsights.AnalyzeSpiel(spiel, apiKey);


                            //Build semicolon separated output records
                            StringBuilder osb = new StringBuilder();
                            osb.Append(components[0] + ";"); //Speaker
                            osb.Append(components[2] + ";"); //Date
                            osb.Append(analytics.SummaryAnalytics.Sentiment);

                            outputFile.WriteLine(osb.ToString());
                            Console.WriteLine(osb.ToString());
                        }
                    }
                }
            }).GetAwaiter().GetResult();

            outputFile.Close();
        }
    }
}

This routine simply loops through all the lines in the list of inaugural addresses, uses an HttpClient to receive each one and then cuts the right <span> in the HTML document that contains the actual speech text and loops through each section of it. The sections/paragraphs are added to the "Spiel" and then we retrieve analytics.

As you can see there is a lot more information in the analytics structures than I have presented here, but I will leave it to you to play more with it and see what interesting stuff you find. I may find some time to post it online in a way that can be searched and browsed. If I do, I will write about it on my blog, so please subscribe.

That's it. Have fun analyzing speeches or playing with the data presented here. Please rate the blog and post any comments you may have.

Complete Results

For completeness, I am adding all the sentiment scores for all inaugural addresses in chronological order.

President Date Sentiment
George Washington April 30, 1789 1.00
George Washington March 4, 1793 0.42
John Adams March 4, 1797 0.95
Thomas Jefferson March 4, 1801 1.00
Thomas Jefferson March 4, 1805 0.65
James Madison March 4, 1809 0.85
James Madison March 4, 1813 0.44
James Monroe March 4, 1817 0.98
James Monroe March 4, 1821 0.92
John Quincy Adams March 4, 1825 0.94
Andrew Jackson March 4, 1829 0.93
Andrew Jackson March 4, 1833 0.70
Martin van Buren March 4, 1837 0.75
William Henry Harrison March 4, 1841 1.00
James K. Polk March 4, 1845 0.91
Zachary Taylor March 5, 1849 0.98
Franklin Pierce March 4, 1853 0.99
James Buchanan March 4, 1857 0.97
Abraham Lincoln March 4, 1861 0.78
Abraham Lincoln March 4, 1865 0.49
Ulysses S. Grant March 4, 1869 0.98
Ulysses S. Grant March 4, 1873 0.81
Rutherford B. Hayes March 5, 1877 0.92
James Garfield March 4, 1881 0.77
Grover Cleveland - I March 4, 1885 0.97
Benjamin Harrison March 4, 1889 0.90
Grover Cleveland - II March 4, 1893 0.83
William McKinley March 4, 1897 0.91
William McKinley March 4, 1901 0.87
Theodore Roosevelt March 4, 1905 1.00
William Howard Taft March 4, 1909 0.94
Woodrow Wilson March 4, 1913 0.98
Woodrow Wilson March 4, 1917 0.72
Warren G. Harding March 4, 1921 0.78
Calvin Coolidge March 4, 1925 0.92
Herbert Hoover March 4, 1929 0.78
Franklin D. Roosevelt March 4, 1933 0.86
Franklin D. Roosevelt January 20, 1937 0.77
Franklin D. Roosevelt January 20, 1941 0.74
Franklin D. Roosevelt January 20, 1945 0.71
Harry S. Truman January 20, 1949 0.81
Dwight D. Eisenhower January 20, 1953 0.84
Dwight D. Eisenhower January 21, 1957 0.86
John F. Kennedy January 20, 1961 0.53
Lyndon B. Johnson January 20, 1965 0.49
Richard Nixon January 20, 1969 0.74
Richard Nixon January 20, 1973 0.80
Jimmy Carter January 20, 1977 0.85
Ronald Reagan January 20, 1981 0.73
Ronald Reagan January 21, 1985 0.80
George Bush January 20, 1989 0.83
William J. Clinton January 20, 1993 0.76
William J. Clinton January 20, 1997 0.87
George W. Bush January 20, 2001 0.67
George W. Bush January 20, 2005 0.85
Barack Obama January 20, 2009 0.57
Barack Obama January 21, 2013 0.84
Donald J. Trump January 20, 2017 0.72