TextAnalytics with AzureML

Text Analytics

Text Analytics is the process of transforming text into information and actionable output. Text is prevalent in all the industries. If something doesn’t already exist in text format, it ultimately lands up as text for consumable purposes. Most of the Machine Learning algorithms depend on CSV and JSON formats, which are object representations in textual format. What we speak also gets converted to text when it comes to speech recognition applications. Why? Because it is easier for applications to parse and consume text. Understanding the context is a different story, and that is where Natural Language Processing comes into play by transforming text into understandable corpora and lexicons. What does Microsoft have to offer in Text Analytics? Combined, our applications process more text than anyone else in the world (Office and Bing).

The AzureML Team has produced a series of experiments for building Text Classification models.

Watch this video: Learn How to Create Text Analytics Solutions with Azure Machine Learning Templates

clip_image002

Text Classification: Step 1 Data preparation

Loading, editing, cleaning, partitioning, and filtering the dataset is covered in this module.

AzureML Gallery Link

Step 2: Text preprocessing

In the preprocessing step, this experiment demonstrates the importance of processing the text for cleaning the dataset. Some of the preprocessing examples include removing special characters, assigning contextual meaning to special characters and text symbols (e.g. J, LOL), removing duplicates, punctuations, and stop-words.

AzureML Gallery Link

Step 3: Feature engineering

Step 3 has 2 parts:

3A: n-grams TF-feature Extraction

3B: unigrams TF-IDF feature extraction

Now that you have the data cleaned up, it’s time to extract features.

For a mathematical machine learning algorithm, textual features doesn’t make much sense, therefore, this module demonstrates the use of Feature Hashing to convert variable length text into numeric feature vectors. When its numbers, Math is happy. The step also demonstrates how to simplify the dimensions of the feature vectors using the “Filter Based Feature Selection” module.

Step 4: Train and evaluate models

The hard part is over. In this module, you select your favorite algorithm(s), and train the machine learning model.

AzureML Gallery Link

Step 5: Deploy trained models as web services

Step 5 has two parts:

5A: Deploy TF Web Service

5B: Deploy TF-IDF Web Service

And finally, you deploy the web services to be used in your applications.

Text Analytics Web Service

If you don’t want to build an ML service from scratch, the AzureML team has also published a TextAnalytics web service in the Azure Datamarket with sample code and documentation.

https://datamarket.azure.com/dataset/amla/text-analytics

Documentation

Sample Code

Text Analytics and Vowpal Wabbit in Azure Machine Learning Studio

https://azure.microsoft.com/en-us/documentation/videos/text-analytics-and-vowpal-wabbit-in-azure-ml-studio/