As promised, in this blog, I will create a Power BI visualization based on Hans Rosling's Ted Talk seen here. While you don't need any Power BI experience to follow along and recreate this, if you want a basic tutorial I would suggest you watch these short videos here first. It's not that anything I do is difficult, but the nature of the source data and my desire to add additional data sources over time, led me to the decision to spend a little extra time "modeling", or organizing and relating my data to provide a foundation for current AND future analysis.
My goals in this demo are simple:
- Mashup 3 different time-series indicators from the World Bank; Total Population, Fertility Rate, and Life Expectancy At Birth
- Create a visualization that shows how the third world has changed over the last 50 years, in ways that you may not have expected
The World Bank API
The World Bank has a pretty good open data site at data.worldbank.org. There are some great datasets and analysis based on global development data. However, I must admit, when I was searching for specific indicators and how I would access them programmatically, it was a little frustrating to navigate. To save you some of that frustration, I've provided the URLs I use below.
All Countries, By Year 1960-2014
Fertility Rate, Births Per Woman: http://api.worldbank.org/countries/all/indicators/SP.DYN.TFRT.IN/?format=json&date=1960:2014&per_page=20000
World Bank Country Classifications: This will allow me to slice by various classifications: http://api.worldbank.org/countries/?format=json&per_page=400
Step 1: Getting the Data Into Power BI
People often think of Power BI as a data visualization tool. But it also has really powerful features for getting data, cleansing and transforming data, and mashing up data from multiple sources. Our first step is to get data into Power BI from a REST API. Here we will use 3 separate API calls to get Population, Fertility, and Life Expectancy data from 1960 through 2014. In this case, the data is returned in JSON format, which we will parse using the query editor. As you follow the steps below, note the Applied Steps section on the right. Power BI is keeping track of each transformation/change as you make it, so the next time you refresh this data it will reapply these changes.
- On the top ribbon, navigate to Get Data=>Web
2. In the From Web dialog box, paste in the url http://api.worldbank.org/countries/all/indicators/SP.POP.TOTL/?format=json&date=1960:2014&per_page=20000
3. Click On List
4. Click on To Table
5. Expand Arrows, uncheck "use original column name", click OK
6. Expand the country column, uncheck Use original column name as prefix
7. Rename columns
id = Country ID
value.1 = Country
value = Population
8. Select your Population column, select Data Type=>Whole Number
9. Select the Date column, select Data Type=>Date
10. Right click on the decimal field and remove it. Also remove the indicator ID field
11. Rename the query to Population, and click Close & Apply
12. Repeat this process for http://api.worldbank.org/countries/all/indicators/SP.DYN.LE00.IN/?format=json&date=1960:2014&per_page=20000
a) When you rename the columns, rename them as follows
id= Country Code
value.1 = Country
value = Life Expectancy
b) Change the Life Expectancy Data Type to Decimal
c) Name the query Life Expectancy and click Close and Apply
12. Repeat this process for http://api.worldbank.org/countries/all/indicators/SP.DYN.TFRT.IN/?format=json&date=1960:2014&per_page=20000
a) When you rename the columns, rename them as follows
id = Country Code
value.1 = Country
value = Fertility Rate
b) Change the Fertility Rate Data Type to Decimal
c) Name the query Fertility Rate and click Close and Apply
Step 2: Merge the Data Sets
We now have three tables; population, life expectancy, and fertility rate. But logically, each table represents a single "attribute" of a country for a given year, so I thought it would make sense to merge them into a single table. This will make it easier for end users to understand.
- On the top ribbon select Edit Queries
- On the top ribbon of the Edit Query screen select Merge Queries=>Merge Queries As New
3. Each of these tables has Country Code, Country Name, and date in common. Merge them into one by selecting the Population table from the first dropdown and the Fertility Rate on the second drop down. Then in each table, while holding Ctrl down, select Country Code, Country Name, and date in each table. You should see that it has matched 14520 out of 14520 records.
You can see 14520 out of 14520 fields are matched.
4. Click OK
5. The Fertility Rate table can now be expanded out. Click on the arrows by New Column, uncheck use original column name as prefix and select the Fertility Rate column
6. Click on the Merge1 Query on the left Query Pane. Select Merge Queries again. This time, instead of selecting Merge Queries as New, Select Merge Queries
7. The first table will already be set to Merge1. Choose the Life Expectancy table for the second drop down. On each table, select County Code, Country Name, and date.
8. Click OK and Expand the new column like you did in step 5 of this section. Uncheck use original column name as prefix and select Life Expectancy
9. Name the new merged table World Development Indicators
10. Select Close and Apply
11. Since we've merged the information from the three other tables into one table, we can hide those three tables. In the field pane, select Fertility, right click and select Hide. Repeat for Population and Life Expectancy.
We now can see one table, World Development Indicators, containing all the information we need.
Step 3: Create Scatter Plot Visualization
- Select the Scatter Chart from the Visualizations Pane
2. In the visualizations pane details, drag Country Name to the Legend, Fertility Rate to the X-Axis, Life Expectancy to the Y-Axis, and date to the Play Axis.
3. Press the Play button to watch the neat animation.
Step 4: Add World Bank Classifications table to allow grouping and filtering by country characteristics
The world bank categorizes countries based on region, lending type, and income level. We want to expand our visualization to let users filter by these additional attributes. While we could use merge the data set like we did in step 2, I wanted to keep the classification table as a completely separate table...for those of your familiar with dimensional modeling, it is essentially serving as a "dimension" table related to our indicator "fact" table. For those of you not familiar with dimensional modeling, forget I said that.
1. In the Power BI Ribbon, click Get Data=>Web and paste this URL: http://api.worldbank.org/countries/?format=json&per_page=400
2. Follow the steps listed in Step 1. (Click on List, To Table, Expand the columns, uncheck Use original column name as prefix)
3. Expand region, only select value
4. Right click on the expanded column and rename column to Region
5. Expand adminregion, select value, rename column to Admin Region
6. Expand incomeLevel, select value, rename column to Income Level
7. Expand lendingType, select value, rename column to Lending Type
8. Rename query to Country Classification, click Close and Apply
Step 5: Create a relationship between the tables
One the most powerful features of Power BI is the ability to relate two or more tables together. Many of you may have done this in Access, or perhaps you are using V-lookups in excel to do something similar. Stop. Use Power BI instead. Seriously.
1. On the left hand side, click on the Relationships icon. You'll see all the tables we've created in this exercise, but the ones we've hidden will be grayed out.
2. The two tables are related by the Country Code column in the World Development Indicators table and the iso2Code in the Country Classification table. Drag Country Code from the World Development Indicators table to the iso2Code in the Country Classification table. This allows us to "lookup" additional attributes of a country, like region, income level, and lending type.
Step 6: Add Classification Filters to your page
1. On the left pane, click back to your visualization.
2. Click anywhere on your page (except on the existing scatter chart visualization), and select Region from the Country Classification table in the fields pane
3. Select the Slicer visualization
4. On the slicer you just created, click on List, and select Dropdown. Resize the filter so it takes up less space.
5. Repeat steps 2,3, and 4 but this time select the Income Level column
6. Repeat steps 2, 3, and 4 but this time select the Lending Type column
7. Arrange the filters and resize the scatter chart to taste.
So that's it. You can now impress your friends and family with your Power BI Story Telling abilities. There are many more World Development Indicators you can get from the World Bank open data site using the same process I showed in this blog. You can get a list of them here. Also, just for fun, I made a little Power BI visualization that lets you choose indicators on the left and see the REST APIs on the right. Enjoy.