Get your analytics to the next level with Azure Cognitive Services

Editor’s note: The following post was written by Office Servers & Services MVP Stéphane Eyskensas part of our Technical Tuesday series with support from his technical editors, Windows Development MVP Toni Pohl and Office Servers and Services MVP Martina Grom.  

Azure Cognitive Services is a set of APIs, which offer lots of different possibilities in regards to Natural Language Processing (NLP) and text/media mining in general - such as extracting keywords and topics from unstructured documents, or detecting emotion.  Right now, they’re still in preview.

In this article, I’m going to illustrate the power of these services, and bring Yammer Analytics to the next level. Yammer is a good use case because it contains tons of information about employees and partners, that can help identify some trends within an organization.

Although Yammer lets you use topics (i.e. hashtags), people don’t always take advantage of them. So, what if you could identify new trends in your reporting instead? What if you could do some sentiment analysis and learn about the overall atmosphere?  And perhaps, more pertinently, what if you could mine content that deviates from the company policy?

Azure Cognitive Services make all of this possible.

In this article, I will guide you through some of these APIs,  and demonstrate how to use them against Yammer data. We will learn:

  • How to identify whether Yammer group conversations are positive or negative.
  • What the most discussed topics for a given Yammer group are
  • How to extract information out of pictures, and identify the ones that potentially deviate from the company policy.

To cover the above, I’ll be consuming the Text Analytics and Computer Vision APIs.

And guess what: you can easily give it a try yourself! They both come with a free Azure plan.

Getting things ready

In a nutshell, V2 APIs are now exposed through Azure API Management. Therefore, before consuming the V2 APIs, one must first create the subscriptions in our Azure Environment.

Subscribing to APIs

  • Connect to https://portal.azure.com/. Click on the plus sign in the top left and type “cognitive” in the search box. You should be redirected to the following screen:

figure1

  • Click on the Create button. Select the required API type and the appropriate plan. Note that both Text Analytics and Computer Vision are currently available only in the West US region. That means before you complete the above in your production environment, please be sure that your governance policy allows it.

Once created, you’ll find the important bits in the Keys section of the API component: figure2

Note that the Quick start section refers to additional documentation and code samples.

So at this stage, the most important action is to grab the keys which you’ll reuse when querying the APIs. Each API has its own key.  

Creating a Yammer App

Since our example is based on Yammer data, you’ll have to register an App for Yammer. You can do this in two ways: using an Azure Active Directory App,  or using  a Yammer App. Since we plan to consume the Yammer data from a background process (to prepare reporting data), a Yammer App is probably the easiest way to go. As an alternative, one could create an Azure Active Directory App with appropriate delegate permissions and use the Password Credentials Flow from our code.  But for the sake of this article, I’ll focus on the easiest approach.

figure3

  • Once the App is created, click on the link labelled Generate a developer token for this application.

The token you now have was issued by a verified Yammer admin, and the App. You should of course not disclose it, and store it in a safe manner. Note that Yammer tokens never expire unless some event explicitly invalidates them (the App/User is removed for instance). This token is the Access Token you’ll have to include whenever talking to Yammer’s REST API.

Writing the code

Now that the environment is ready, it’s time to consume both Yammer and Azure Cognitive Services APIs. Remember that our objective is to collect messages and files of a Yammer group in order to detect topics, sentiment and images that potentially deviate from the company policy, such as adult-only content.

But first things first, let’s try out a few queries to make sure everything works fine. In a Console Program (which you could transform as an Azure Web Job later), make sure to include the following NuGet packages: Microsoft.ProjectOxford.Vision and Nito.AsyncEx. The latter is optional and helps run asynchronous code form console programs. To add these packages to your projects, simply right click on the References node ➔ Manage NuGet Packages ➔ and browse for the ones mentioned above.

Testing the Text Analytics API (Sentiment)

 byte[] data = Encoding.UTF8.GetBytes("{\"documents\":[" + "{\"id\":\"1\",\"text\":\"This should rock!\"}]}");
HttpWebRequest req = 
    HttpWebRequest.Create(
        "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment") as HttpWebRequest;
req.Headers.Add("Ocp-Apim-Subscription-Key", "{subscriptionkey}");
req.Accept = "application/json";
req.Method = "POST";
req.ContentType = "application/json";
req.ContentLength = data.Length;

req.GetRequestStream().Write(data, 0, data.Length);

using (StreamReader sr = new StreamReader(req.GetResponse().GetResponseStream()))
{
    Console.WriteLine(sr.ReadToEnd());
}

Running the above code will result in the following JSON response: figure4

 

 

 

 

 

As you already understood, the next step will be to analyze Yammer conversations.

Testing the Yammer token and the Computer Vision API at the same time (Image analysis)
Locate a target Yammer group, as well as a picture that was uploaded into it that you’d like to test. In order to get the picture information easily, just use the UI, go to the Files section of a given group, click on a picture and copy the file ID from the URL. Once done, you can run the following code:

  HttpClient cli = new HttpClient();
cli.DefaultRequestHeaders.Add("Authorization", "Bearer {YammerToken}");
using (Stream image =
    await cli.GetStreamAsync("https://www.yammer.com/api/v1/uploaded_files/{id}/download"))
{
    VisionServiceClient vscli = new VisionServiceClient("{subscriptionkey}");
    VisualFeature[] features = new VisualFeature[] {
        VisualFeature.Adult, VisualFeature.Tags };
    AnalysisResult result = await vscli.AnalyzeImageAsync(image, features);
}

This should result in something similar to this: figure5

Note that you can get much more from the Computer Vision API. I explicitly restricted the visual features to Adult Content and Tags. Of course, you must replace values between curly braces with your own.

Writing the actual code

If you got to this stage, it means that you managed to configure and test the various APIs properly. It’s now time to write the actual code. Before we go further, it is important to notice that one of the operations we aim to do, namely the Topic Detection is a bit harder than others. Indeed, while Sentiment and Image analysis are quite straightforward, Topic Detection resorts to a background job (developed by Microsoft) running in Azure. This means that when sending data to this operation, one must wait until the job completes to fetch the actual results. This is a two-step operation. The first step consists of sending the data for analysis, while the second step consists of checking whether the background operation is completed or not.

Right after having sent data to the Topic Detection operation, the answer will be as follows: figure6

And in querying the URL returned by the Operation-Location HTTP response header, you’ll see its status: figure7

The status may be either of the following values: NotStarted, Running, Succeeded, Failed. Once the operation completes, the status will be succeeded and results will be available as part of the answer.

Now that the process is clear, let’s write some code. The below code is a console application that performs a data analysis of a given Yammer group. In the group, I created 102 conversations with random content coming from Wikipedia. The console application creates some statistics out of the results. For brevity reasons, I’m showing raw results. The ideal scenario is to store them into a database and run some Power BI reports on top of it.

 class YammerGroup
{
    public List Messages
    {
        get;set;
    }
    public List Files { get; set; }
    public List Topics { get; set; }
        
}
class Topic
{
    public string Label { get; set; }
    public double Score { get; set; }
}
class YammerMessage
{
    public string Id { get; set; }
    public string Body { get; set; }
    public double SentimentScore { get; set; }
    public List KeyPhrases { get; set; }
}
class YammerImageFile
{
    public string Id { get; set; }
    public string Download_Url { get; set; }
    public string Name { get; set; }
    public List Tags { get; set; }
    public bool IsAdultContent { get; set;}
    public bool IsRacyContent { get; set; }
}
class Program
{
 static string ComputerVisionKey = "";
 static string TextAnalyticsKey = "";
 static string YammerToken = "Bearer ";
 static string YammerGroupId = "";
 static void Main(string[] args)
 {            
    AsyncContext.Run(() => MainAsync(args));
 }
 static async void MainAsync(string[] args)
 {            
    YammerGroup group = new YammerGroup();
    //retrieving the conversations of a given group
    List messages = GetYammerMessages(YammerGroupId);
    //preparing dataset for APIs
    byte[] data = GetAPIData(messages);
    //performing the keyPhrase analysis against the Yammer messages
    await KeyPhraseOrSentiment(data, messages);
    //performing the sentiment analysis against the Yammer messages
    await KeyPhraseOrSentiment(data, messages, true);
    group.Messages = messages;
    //retrieving the images of that group
    List files = await GetImageFiles(YammerGroupId);
    await DonwloadAndAnalyzeYammerFile(files);
    group.Files = files;
    Console.WriteLine("---------------TOPIC DETECTION-----------------");
    if (await GetGroupTopics(data, group))
    {                
        var MostDiscussedTopics = group.Topics.OrderByDescending(t => t.Score);
        foreach (var MostDiscussedTopic in MostDiscussedTopics)
        {
            Console.WriteLine("{0} {1}", 
                MostDiscussedTopic.Score, 
                MostDiscussedTopic.Label);
        }
    }
    else
    {
        Console.WriteLine("Toppic Detection failed");
    }
    Console.WriteLine("------------------SENSITIVE CONTENT---------------");
    var TrickyImages = group.Files.Where(f => f.IsAdultContent == true || f.IsRacyContent == true);
    foreach (var TrickyImage in TrickyImages)
    {
        Console.WriteLine("{0} {1}",
            TrickyImage.Name,
            (TrickyImage.Tags != null) ? 
                string.Join("/", TrickyImage.Tags) : String.Empty);
    }            
    Console.WriteLine("----------------NEGATIVE MESSAGES--------------");
    var NegativeMessages = group.Messages.Where(m => m.SentimentScore  m.SentimentScore >= 0.5).Take(5);
    foreach (var NegativeMessage in NegativeMessages)
    {
        Console.WriteLine("ID : {0} Score : {1} KP : {2}",
            NegativeMessage.Id,
            NegativeMessage.SentimentScore,
            string.Join("/", NegativeMessage.KeyPhrases));
    }
    Console.WriteLine("--------------POSITIVE MESSAGES--------------");            
    foreach (var PositiveMessage in PositiveMessages)
    {
        Console.WriteLine(
            "ID : {0} Score : {1} KP : {2}",
            PositiveMessage.Id,
            PositiveMessage.SentimentScore,
            (PositiveMessage.KeyPhrases!=null) ? 
                string.Join("/", PositiveMessage.KeyPhrases):String.Empty);
    }
 }
 static async Task DonwloadAndAnalyzeYammerFile(List files)
 {
    HttpClient cli = new HttpClient();
    cli.DefaultRequestHeaders.Add("Authorization", YammerToken);
    //watch out : with the free pricing tier, no more than 20 calls per minute.
    foreach(YammerImageFile file in files)
    {
        using (Stream image =
            await cli.GetStreamAsync(
                string.Format(
                    "https://www.yammer.com/api/v1/uploaded_files/{0}/download",
                    file.Id)))
        {
            VisionServiceClient vscli = new VisionServiceClient(ComputerVisionKey);
            VisualFeature[] features = new VisualFeature[] { VisualFeature.Adult, VisualFeature.Tags };
            AnalysisResult result = await vscli.AnalyzeImageAsync(image, features);
            file.IsAdultContent = result.Adult.IsAdultContent;
            file.IsRacyContent = result.Adult.IsRacyContent;
            if (result.Tags.Count() > 0)
            {
                List tags = new List();
                var EligibleTags = result.Tags.Where(t => t.Confidence >= 0.5);
                if (EligibleTags != null && EligibleTags.Count() > 0)
                {
                    foreach (Tag EligibleTag in EligibleTags)
                    {
                        tags.Add(EligibleTag.Name);
                    }
                }
                file.Tags = tags;
            }
        }
    }           
 }
static async Task<List> GetImageFiles(string groupid)
{           
    List YammerFiles = new List();
    HttpClient cli = new HttpClient();
    cli.DefaultRequestHeaders.Add("Authorization", YammerToken);            
    HttpResponseMessage resp = 
        await cli.GetAsync(
            string.Format(
                "https://www.yammer.com/api/v1/uploaded_files/in_group/{0}.json?content_class=images",
                groupid));
            
    JObject files = JObject.Parse(await resp.Content.ReadAsStringAsync());
    foreach(var file in files["files"])
    {
        YammerFiles.Add(new YammerImageFile
        {
            Id = file["id"].ToString(),
            Download_Url = file["download_url"].ToString(),
            Name = file["name"].ToString()
        });
    }
    return YammerFiles;
}
static byte [] GetAPIData(List messages)
{
    StringBuilder s = new StringBuilder();
    s.Append("{\"documents\":[");
    foreach (var message in messages)
    {
        var json = JsonConvert.SerializeObject(message.Body, new JsonSerializerSettings
        {
            StringEscapeHandling = StringEscapeHandling.EscapeNonAscii
        });                        
        s.AppendFormat("{{\"id\":\"{0}\",\"text\":{1}}},", message.Id,  json);                        
    }          
   
    return Encoding.UTF8.GetBytes(
        string.Concat(
            s.ToString().Substring(0, s.ToString().Length - 1), "]}"));
}
static List GetYammerMessages(string groupid)
{
    List ReturnedMessages = new List();
    bool FullyParsed = false;
    string LastMessageId = "";                

    HttpWebRequest req = HttpWebRequest.Create(
        "https://www.yammer.com/api/v1/messages/in_group/" +
        groupid + ".json?threaded=true")
            as HttpWebRequest;
    req.Headers.Add("Authorization", YammerToken);
    req.Accept = "application/json; odata=verbose";
            
    while (!FullyParsed)
    {
        using (StreamReader sr =
                new StreamReader(req.GetResponse().GetResponseStream()))
        {                    
            JObject resp = JObject.Parse(sr.ReadToEnd());
            JArray messages = JArray.Parse(resp["messages"].ToString());                    
            LastMessageId = "";
            foreach (var message in messages)
            {
                if(!string.IsNullOrEmpty(message["body"]["parsed"].ToString()))
                {
                    ReturnedMessages.Add(new YammerMessage
                    {
                        Id = message["id"].ToString(),
                        SentimentScore = -1,
                        Body = message["body"]["parsed"].ToString()
                    });
                }                                            
                LastMessageId = message["id"].ToString();                                         
            }                    
        }
               
        if(string.IsNullOrEmpty(LastMessageId))
        {
            FullyParsed = true;
        }
        else
        {                   
            req = HttpWebRequest.Create(
                "https://www.yammer.com/api/v1/messages/in_group/" +
                groupid + ".json?threaded=true&older_than=" +
                LastMessageId) as HttpWebRequest;
            req.Headers.Add("Authorization", YammerToken);
            req.Accept = "application/json; odata=verbose";
        }
    }
    return ReturnedMessages;
}
static async Task GetGroupTopics(byte[] data,YammerGroup group)
{
    group.Topics = new List();
    HttpClient cli = new HttpClient();
    cli.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", TextAnalyticsKey);
    var response = await cli.PostAsync(
        "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/Topics/?minDocumentsPerWord=15",
        new ByteArrayContent(data));
    var OperationLocation = response.Headers.GetValues("Operation-Location").First();
    while(true)//should implement timeout
    {
        JObject documents = 
                JObject.Parse(await GetTopicResult(cli, OperationLocation));
        string status = documents["status"].ToString().ToLowerInvariant();
        if (status == "succeeded")
        {
            JArray topics = JArray.Parse(
                documents["operationProcessingResult"]["topics"].ToString());
            foreach (var topic in topics)
            {
                group.Topics.Add(new Topic
                {
                    Label = topic["keyPhrase"].ToString(),
                    Score = Convert.ToDouble(topic["score"])
                });
            }
            return true;
        }
        else if (status == "failed")
            return false;
        else
        {
            Thread.Sleep(60000);
        }                
    }
            
}
static async Task GetTopicResult(HttpClient client, string uri)
{
    var response = await client.GetAsync(uri);
    return await response.Content.ReadAsStringAsync();
}
static async Task KeyPhraseOrSentiment(byte[] data, List messages, bool sentiment=false)
{            
    HttpClient cli = new HttpClient();
    cli.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", TextAnalyticsKey);
    var uri = (sentiment == true) ? 
            "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment" : 
            "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases";
    var response =
        await cli.PostAsync(
            uri,
            new ByteArrayContent(data));
    JObject analysis = JObject.Parse(response.Content.AsString());
    foreach (var document in analysis["documents"])
    {
        var TargetMessage = 
                messages.Where(
                    m => m.Id.Equals(document["id"].ToString()))
                    .SingleOrDefault();
        if (TargetMessage != null)
        {
            if(sentiment)
                TargetMessage.SentimentScore =
                        Convert.ToDouble(document["score"]);
            else
            {
                JArray KeyPhrases = 
                        JArray.Parse(document["keyPhrases"].ToString());
                List kp = new List();
                foreach(var KeyPhrase in KeyPhrases)
                {
                    kp.Add(KeyPhrase.ToString());
                }
                TargetMessage.KeyPhrases = kp;
            }                       
        }
    }
 }       
        
}

Here’s a short explanation of the code where we perform the following sequence of actions:

  • We retrieve the messages from the Yammer Group using the Yammer REST API, together with the token we created earlier.
  • We prepare the dataset in JSON format, to be sent to the APIs. The same dataset may be sent to different APIs.
  • We perform the Key Phrases extraction.
  • We perform the Sentiment Analysis.
  • We get and perform the Computer Vision analysis against the Yammer group’s images.
  • We launch the Topic Detection
  • We end up performing some LINQ queries against the list of objects we built in the previous steps.

Most of the methods developed here are nothing else but raw HTTP requests. At the time of writing, Text Analytics do not ship with a NuGet Package, unlike the Computer Vision API. The trickiest part consists in detecting topics, as Azure runs a background job and one has to wait until this job is finished. Here is an example of the output: figure8

Try it yourself 

If you try out the above code, it will surely work.  But here’s some things you should pay particular attention to:

  • Set up the environment correctly as explained in the beginning of this article.
  • The pricing tier you are using as the number of allowed transactions per month (but also per minute on some operations) varies according to the plan.
  • Try it against a Yammer Group - which contains at least 100 messages -  because the Topic Detection operation requires a minimum of 100 documents.
  • Topic Detection is very sensitive and you’d better test it against genuine content.
  • For the sake of brevity and readability, I didn't include any exception handling in the code samples. But you should of course do it for production code.

The entire code sample can be found on GitHub https://github.com/stephaneey/azure-cognitive-services/.

Conclusion

In this article, we have seen how to mine data coming from one Yammer group. But you can of course expand that to all the groups. If you plan to analyze the entire Yammer content, you’d better use Yammer’s export API. I have written a NuGet Package (Install-Package Eyskens.YammerExportAPIWrapper) that helps dealing with exporting data and files.  

We only worked with two Azure Cognitive Services APIs, but they offer much more. You can check them out on the official documentation page: https://www.microsoft.com/cognitive-services/en-us/apis.

There are many other use cases where these APIs may come into play, not only for reporting purposes, but also to control user input in real-time and act upon it.


stephane

Stéphane Eyskens is a senior technical architect in Office 365, Azure PaaS and SharePoint. He’s been a consultant for 17 years, and holds a master’s degree in ICT Sciences. During his studies, he about learnt NLP and NER techniques, hence the natural attraction for Azure Cognitive Services.

 

Follow him on Twitter @stephaneeyskens