Hopefully you have seen our announcement about the new Azure API Management Analytics Power BI solution template, giving you eye-catching, customizable reports on you API Management traffic. This post takes a deep dive into the component parts, how they work, and how you can customize the solution to meet your individual requirements.
You can install the solution today at http://aka.ms/apimpbi, or you can browse the source code of the source on Git. There is no additional charge, the template is deployed to your Azure subscription, and you pay for the components you consume (we estimate this to be under $10 a day).
What is the solution made up of?
To install the solution, you will need to log in with credentials that have access to your Azure subscription. Once you have done so, we create the following:
- Azure Event Hub
- Azure Stream Analytics
- Azure SQL (or you can use an existing instance)
- Azure Analysis Services (optional, additional cost, for high-scale deployments)
- 4 Logic Apps
- Function App containing 3 Functions
- Azure Machine Learning Web Service
Finally, we call the API of your API Management Service to install a Logger (associated with the Event Hub), and a Global Policy that uses Log-to-Event Hub.
Log to Event Hub
Important – if you are already using a Global policy in your service, running the template installer will overwrite it. Please ensure you take a backup of your policy before running. The policy used can be found in the solution template Git repo.
This solution uses the Log-to-Event Hub policy to copy data from the request/response/error stream of your API Management instance. The policy creates a comma-separated list of values drawn from the context, and pushes them to Event Hub for streaming. There are 3 separate policies; one each for Request, Response and Error. These separate streams go all the way through to the solution’s database. Only then are they joined together in the views used to output data to Power BI.
Event Hub & Stream Analytics
A simple query is used in Stream Analytics to select data from each stream, and write them to the respective table in Azure SQL. Request flow through the solution quickly, meaning data in SQL is near to real-time.
Once rows are written to SQL, the data stream is complete, and data can be viewed in Power BI. Whilst there is no archival process in the solution template, most queries are limited for performance reasons to 90 days or less (this of course can be changed at your discretion).
Data Views & Power BI
All the interaction between Power BI and the database is done through SQL Views. When refreshing the Power BI report, it will select from them to populate it’s local data model. There are a number of them, (e.g. the ‘summary’ views provide lookup tables for APIs, Operations, Products & Subscriptions), but the most important are as follows:
- AllRequestData: this view houses a join of all requests joined to their subsequent responses. Most of your API data is found here, including backend errors. This should be the largest table by row count in your data set.
- AllErrorDetail: this view contains details of all API Management Gateway errors
In addition to the base data streamed from API Management, we do some additional processing of your data for further insight. All of that processing is managed by Logic Apps. I’ll explain what each one does:
- ProcessIPAddresses: in order to understand where you API calls are coming from, we need to turn the IP address we have into a latitude/longitude. The Logic App calls a stored procedure to assign lat/long values every 15 minutes.
- CallFrequency: this Logic App triggers an Azure Machine Learning job that creates a fast fourier transfom of the calls from the top 50 IP addresses calling in the last 72 hours. The job runs every 6 hours.
- CallGraph: this Logic app calls into 2 Azure Functions, that look for call correlations (one operation call followed by another) in a one second window over the last 72 hours. The job runs once every hour.
- LoadIPAddressDB: this product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com. This Logic App runs every 30 days to download the latest IP address file, and save it to the database.
Frequently Asked Questions
Q. I have high amounts of API traffic. Will this solution scale to my needs?
A. Each component in the streaming pipeline (Event Hubs, Stream Analytics, Azure SQL) is independently scalable. Additionally, Azure Analysis Services can be added to support very high request volumes. The first component that normally will see pressure in scale will be Azure SQL, which may need to be scaled to support high daily call volumes (e.g. 50k requests/day).
Q. I tried it, but don’t want to use it any more. What do I do?
A. No problem! Simply delete the global policy, and delete the solution template resource group. Your API Management instance will continue running without issue.
Q. I want to add additional data – e.g. the request body. How can I do this?
A. You will need to make sure that the new field flows all the way through to the report. For example:
- Add a new field to capture the data in the appropriate table in the database. For example, an additional string that is part of the request – create a varchar column in the request table.
- Modify the AllRequestData view to also return that field.
- Update the data model in Power BI to include the new field.
- Create or modify a report control to use that data.
- Add the field to the Stream Analytics request query.
- Modify the Global Policy to emit the new field. Ensure the field is named the same as you refer to it in Stream Analytics