What Can You Do with Azure Media Indexer?



The General Availability of Azure Media Indexer has been announced. Azure Media Indexer, formerly known as the Microsoft Audio Video Indexing Service (MAVIS), is a powerful, market differentiated content extraction service which can be used to search inside a video instead of just its metadata. Further background information on Azure Media Indexer can be found here.

Now, exactly what can you do with it? For the basic usage of submitting a video asset for Azure Media Indexer to generate closed caption and key words, please see the blog. In order to explore the scenarios in which Azure Media Indexer can be used, the author has developed a demo solution. This demo solution contains reusable code (in the form of C# DLL project), SQL and data model. The solution involves SQL Server full-text search.


The Demo Solution

The high level architecture of the solution is shown in the following diagram:

The home page of the demo solution can be accessed here.

The solution operates in two modes/times:

  1. Input mode during workflow time (depicted by the dark blue lines in the above diagram): during this time, video assets are processed by Azure Media Indexer. Specifically,
    1. Media asset is submitted for processing as a Azure Media Indexer task;
    2. The output caption files are uploaded to origin servers for playback display of Closed Caption;
    3. The .aib (Audio Index Blob) file is uploaded into SQL Server database and is then indexed by SQL Server full-text indexing component for full-text search.
  2. Output mode during query or playback time (depicted by the red lines in the above diagram):

    1. Audio search: query terms are pre-processed and submitted for SQL Server full-text search. Search results are post-processed for snippet info and displayed in ASP.NET pages.
    2. Playback with closed caption: TTML or SAMI file can be used during playback time for CC display;
    3. Keywords: keywords for a given video can be retrieved any time from database.


The Scenarios Covered by the Demo Solution

Currently the solution covers the following scenarios:

  1. Closed Captions: Generate Closed Caption text files in TTML or SAMI formats by running an Azure Media Indexer task via Azure Media Services .NET API. The Closed Captions data files can then be used with videos to display Closed Captions. From this HTML5 page, you can view the videos together with Closed Caption generated by Azure Media Indexer.
  2. Audio Search: Today’s video is like printed book without an index or a table of contents, hence no way to search through it. Azure Media Indexer indexes the audio track of a video. The binary index can then be imported into SQL Server so that SQL Server full-text search can be used to perform stemmer-based text search. Users can search either a collection of videos or a specific video by entering a search text. The search text will first be pre-processed so that inflectional forms of search terms will be generated. The search results will be displayed in the form of snippets which contain the search terms. Clicking on a snippet allows you to start the video from the start of the snippet (minus 2 seconds which can be changed easily). For example, the video library contains a 2 hour 45 minutes live archive of a keynote of Build 2014. If you search “NBC Sports”, you can jump into the section of the long video in which Scott Guthrie and Rick Cordella talked about NBC Sports/Sochi Olympics solution. Similarly, if you search “Roslyn”, you can jump right into the section of the video in which “Project Roslyn” is presented. Without this kind of audio search, it is not easy to locate different sections in the video. In addition, since the search contains inflectional forms, a search of “focus” will also return snippets containing
    “focusing” or “focused”. You can try here.
  3. Sub-clipping based on search results: once we get desired search results (video snippets), we can use Azure Media Encoder to perform sub-clipping to stitch together a set of snippets to come up with a new video asset. Details of sub-clipping and video stitching using Azure Media Encoder can be found here. You can try the results here.
  4. Create keywords for video: Key words generated by Azure Media Indexer can be exposed to search engines such as Bing, Google or Microsoft SharePoint (in an Enterprise Content Management context) to make the media files more discoverable, or used to deliver more relevant ads. Keywords can also be translated into different languages.

NOTE: In the search scenario, for a given search term, all of its inflectional terms are searched. For example searching thought results in the following query:

SELECT FileID, FilePath, Title, Description, AIB, TTML, CT.Rank FROM Files INNER JOIN CONTAINSTABLE (Files,AIB,'FORMSOF(INFLECTIONAL,"spoken:thought@@@")',30) AS CT ON Files.FileID = CT.[KEY] ORDER BY RANK Desc

It then returns results containing “thoughts”, “think”, or “thinking”.


The Limitations

As the first release, Azure Media Indexer currently has the following limitations. Your feedback would be certainly welcome and helpful in its future enhancements.

  1. Azure Media Indexer works on English language only;
  2. The processing time is generally longer than video duration. Therefore Media Azure Indexer is not intended to be used for real time or scenarios in which fast turn-around is needed. For example, for the Build 2014 key note live archive video (2 hours 45 minutes in duration) used in my solution, the Azure Media Indexer task ran for 4 hours 29 minutes, which translates to about 1.6x. For shorter video clips, this ratio is much larger.
  3. SQL Azure is not currently supported by the SQL Add-on for performing full-text search. In this implementation, I have used 64-bit SQL Server 20014 on Windows Server 2012 R2 running on Azure IaaS VM.
  4. The formats of videos (as input to Azure Media Indexer) are limited to the following, not yet including any adaptive bitrate streaming formats.




RIFF WAV Audio, PCM encoding


Windows Media Audio


Windows Media Video


MPEG Layer-3 Audio


H.264 (MPEG4 Part 10) Video


MPEG-4 Audio


AAC (Advanced Audio Coding) Audio




An end-to-end implementation of a demo solution using Azure Media Indexer has been presented in this blog. The four scenarios covered by the solution are discussed, as well as its current limitations.

Comments (0)

Skip to main content