Which airline has the most cancelled flights? Find out using Data Visualization.

imageThis is the time of year when airline travel is peaking and I was intrigued by the data visualization provided by FlightAware called the Misery Map. I love the real time nature of the data and the weather patterns that overlay the geospatial visualization. Check it out if you haven’t already: http://bit.ly/1bITHc6 . This set me thinking about the interesting insights I could get using Power BI with Excel around airline data. Possible sources of information that I was looking at included:

·       Airline and Airport Information from the Bureau of Transportation Statistics

·       Bing Search Data available using the Bing Ads Intelligence Addin

·       Twitter Data using the Twitter Analytics for Excel AddIn

For acquiring the data I used Excel 2013 with Power Query and Power Pivot. Power Query needs to be installed from http://www.microsoft.com/en-us/download/details.aspx?id=39933 . I used Power View and Power Map for visualizing the data. Power Map for Excel can be downloaded from http://www.microsoft.com/en-us/download/details.aspx?id=38395

Working with On-Time Performance Data

The On-Time Performance data is downloadable from Bureau of Transportation Statistics Site and is accessible via LINK. The Download link enables you to specify what fields you want to download as well as filter the geography and time period of the data. This table contains on-time arrival data for non-stop domestic flights by major air carriers, and provides such additional items as departure and arrival delays, origin and destination airports, flight numbers, scheduled and actual departure and arrival times, cancelled or diverted flights, taxi-out and taxi-in times, air time, and non-stop distance. The table is rather large with 109 fields and data that ranges from 1987 to 2013 and that is a whopping 153,433,758 records. The latest data that is available is for September 2013 so I decided to download just that month:


Before downloading the data select the fields you think would be most useful for your analysis. Here are the fields that I selected:

·       Time Period:

o   FlightData

·       Airline

o   UniqueCarrier

o   FlightNum

·       Origin

o   OriginCityName

o   OriginState

·       Destination

o   DestCityName

o   DestState

·       Departure Performance

o   DepTime

o   DepDelayMinutes

o   DepDelay15

o   TaxiOut

·       Arrival Performance

o   ArrTime

o   ArrDelayMinutes

o   ArrDel15

o   TaxiIn

·       Cancellations & Diversions

o   Cancelled

o   CancellationCode

·       Flight Summaries

o   ActualElapsedTime

o   AirTime

o   Distance

·       Cause of Delay

o   CarrierDelay

o   WeatherDelay

o   NASDelay

o   SecurityDelay

o   LateAircraftDelay

Clicking Download yielded a CSV file with 510,807 rows. I have made the raw CSV file available at http://sdrv.ms/191VO8H

I used PowerPivot to import the 510,806 rows of data from the CSV file:


I also noticed that many of the fields in the OnTimeperformance table had lookup tables that I could use for more descriptive reports and visualizations.



I decided to download the UniqueCarrier lookup table and import the CSV files using PowerPivot. Once this was done, I used the diagram view in PowerPivot to establish the relationship between the two tables:


Once the tables were imported into PowerPivot, I could run some visual analysis with the data to learn some interesting insights. For instance, I wanted to find out what airline had the most flights in September. This is the count of the Flight Numbers in a bar chart:


Southwest Airlines was clearly the winner in this one followed by Delta Airlines and ExpressJet.

To find out which city had the most flights originating from it, I listed out the cities and the count of flight numbers from it:


Chicago leads here with Atlanta following really close. I had a look at destination cities as well and as is obvious it’s the same. An interesting pattern that I noticed when I plotted the number of flights per day was the dip every Saturday on the number of flights:


 The CANCELLED field in the table contains a 0 or 1 depending on whether a flight was cancelled or not. I used the column to determine which airline had the most cancellations:


ExpressJet, American Eagle and Skywest turned out to be the top 3 in the list with the most cancellations. The OnTimePerformance table has a field that indicates a code for the reasons for cancellation of flights. The codes range from A to D and there is a lookup table that describes the actual reasons for the cancellation. I decided to download the CSV file and establish a relationship between the OnTimePerformance table and the CancellationCodes table:


Once this was done, I could determine the most common causes for cancellation among the airlines:


Interestingly, the number one cause for cancellations was the carrier and not weather as I had thought it would be. Weather and Security were the less frequent reasons for flight cancellations. Security was pretty low as a reason for why flight got cancelled.

Looking at delay information, I was curious to see which airline had the biggest delays in departure. The DEP_DEL15 field has a 1 or 0 depending on whether a flight was delayed more than 15 minutes. I decided to use this field to determine what and how many flights had more than a 15 minute delay:


Southwest Airlines seems to have the most delays thus far. Looking at the reasons for delay, it appears that Arrival of the flight from its previous location seems to be the biggest reason:


Weather delays are much lower than Carrier delays and NAS delays. Looking at origin cities with the most weather related delays shows me that Chicago, Dallas and Denver lead the list:


Working with Twitter Data

To acquire twitter data for some of the Airlines you can use the Twitter Analytics Addin for Excel from http://bit.ly/YP7XaZ . For the analysis, I picked the top 5 Airlines from Airfarewatchdogs 2013 rankings :

·       Frontier

·       Virgin America

·       JetBlue

·       Alaska

·       Southwest


The Twitter Analytics Addin returns some interesting insights about the Airlines that I ran a search for. The image below shows the tweets for 11/27-12/4 for the 5 Airlines:


Virgin America shows a big spike on 11/29 which was probably due to Virgin America’s black Friday deals. I drilled down into the VirginAmerica tweets to look at the most popular hashtags associated with them and noticed that cybermonday and blackfriday were among the top few, cybermonday being the highest:


Here is a video walkthrough of the tweets across the country using Power Map. The popular tweets have a correlation with what airline is popular in different parts of the country:


Working with Bing Search Data

To work with Bing Search data, I did the following:

1.       Create an account at http://bingads.microsoft.com

2.      Download the Bing Ads Intelligence AddIn for Excel.

I decided to run a Search Traffic analysis for the Top 5 Airlines in the US for 2013. In addition, I also ran a query to find out the age and gender breakdown for the searches for the airlines for 2013. The last query that I ran was to get an analysis by state on the searches for the airlines. Here is the analysis that I visualized using Power View:


Southwest seems to have the highest searches for 2013. I also ran a visualization to show the male/female ratio of the searches: females show a slightly higher percentage than males across all the airlines:


An age breakdown of the data shows the following:

·       JetBlue leads in the 18-24 group

·       Virgin America leads in the 25-34 & 35-49 groups

·       Southwest leads in the 50-64 & 65+ groups



Using Power Map, I then mapped the search data to states to get this insight:




I have made the final Excel Spreadsheet with all of the data including:

·       On Time Performance Data from the Bureau of Transportation Statistics

·       Twitter Data for the Top 5 US Airlines for 2013

·       Bing Search Data for the Top 5 US Airlines for 2013

·       Visualizations for all of the data

The spreadsheet can be found at http://sdrv.ms/191VO8H I would love to hear your feedback, ideas and suggestions.

Comments (2)

  1. Matt Smith says:

    Looks like your Twitter add-in for Excel url needs to be updated.


  2. shishirs says:

    Thanks for letting me know. The Twitter Analytics for Excel Addin is now available at husting.com/twitter-analytics-for-excel