Windows Azure Stories: XML Travelgate processing more than 20 million booking requests per day in the cloud

Today’s post, written by Pedro Brücher, CEO at XML Travelgate describes how the company uses Windows Azure Platform as a Service (PaaS) capabilities to run their B2B booking platform.   You can also read his recent interview in the BizSpark Featured Startups blog.

Pedro_Bucher_XMLTravelgate

XML Travelgate is a company founded in 2012 in Palma de Mallorca (Spain). Our goal is providing technology partnerships for OTAs, tour operators, wholesalers, incoming agencies, etc. Our philosophy is to do only one thing but being the best at it. That is why we are 100% dedicated to doing XML integrations for the travel industry. We specialize in the development of integrations and only solve the inter-system's connectivity problem. Our mission is to help companies connect to each other seamlessly so they can instantly buy and sell to each other.

The Product

We think that nowadays integrations are the very core of any travel company and therefore it is crucial to keep them well maintained. We develop and maintain integrations between any travel product supplier such as hotels, airlines and travel product sellers, such as (online) travel agencies or wholesalers. We have already developed more than 150 integrations for several different travel products: flights, hotels, ferries, rail, car rentals and transfers.

The Motivation

Besides developing integrations, the other half of our mission is maintaining them. To do it well, we constantly monitor the traffic and status of every request processed by our systems. That requires total control of is using our integrations, which means we must host them in a controlled environment to be able to log every request and status. So we have built a web service that allows our clients to send us requests and we forward them to the required supplier. Since our inception, the strategy has been to use a cloud service, paying only for what we needed (we could not predict the traffic we would have eventually), scalability and savings in sys-admin costs.

The Requirements

As our integrations were developed using Microsoft technology it seemed natural to use the Microsoft cloud. Nevertheless before we chose Windows Azure we studied the different cloud platforms available to determine which was more suitable. The requirements of our application were:

1. Scalability: We needed to build an application that could be scaled up horizontally.

2. Reliability: Our service must be stable. Any downtime could mean the loss of a client and potentially close down our company and partners.

3. Performance: Our clients are other machines connecting to our web service as opposed to people accessing web sites. They have a timeout limit for every request and after that the connection is simply dropped.

4. On-demand: the travel industry is heavily seasonal. In the high season (summer) traffic can increase up to 10 times. We must be flexible and allow for those peaks without any downtime.

5. Reaction Time: as part of our maintenance service we must be able to deploy updates and improvements to our integrations very quickly. Updates have to be deployed very fast as we realize any small bug can cause losses.

6. Price: being a startup we cannot afford to buy machines and infrastructure to run our service. We must be able to grow in hardware as the business grows.

7. Audit: ability to save requests for later inspection.

The Architecture

To fully understand the architecture of our application one has to comprehend how the application has to handle requests. When machines connect to our service they have a timeout limit and if no response is received within the timeout the connection gets dropped and the request is lost. We cannot send them the typical “Please be patient your request is taking longer than expected” message. To simplify how it works, once the web server receives a request it must instantiate the library in charge of connecting to the supplier. Once the response is received this library processes the data and delivers the result back to the web server which in his turn responds it back to the client.

We have 3 APIs (transportation, hotel and car) and we wanted to have them as totally separated services not dependable on each other. It made sense to have only one entry point for all requests and pass the responsibility of instantiating libraries further down the line. For that we needed machines specialized in each API and some way of passing the data from the web server to these machines. When we first looked into Windows Azure we realized that they all were exactly what we needed. The infrastructure fit perfectly with our needs. Most of our initial requirements were met. All we needed to do was decide which cloud to use.

image

Why Windows Azure?

We narrowed down our choices to the two main cloud providers: Windows Azure and Amazon Web Services. First we looked at the costs. Both were pretty much the same. Then we looked at the infrastructure, also very similar. Basically we were already working with Microsoft technology. Microsoft is a very stable company and very trustworthy. But the main advantage for us was that with Windows Azure we could use a Platform-as-a-Service (PaaS) approach, so we could focus on our product and not on managing the infrastructure. That increased our agility and reduced or risks. It just seemed very logical to use Windows Azure. And today we can say with confidence that we’ve made the right choice!

Under The Hood

Our Windows Azure solution consists of the following Azure services:

1. Web Role Cloud Service: In charge of receiving every request and forwarding it to the appropriate worker role using WCF. Our first solution used the Service Bus to pass messages between web and worker roles. Although it worked, we had problems when passing more than 100 messages per second. When a request arrives or leaves an event is triggered and sent to an Azure Queue for statistical purposes.

2. Worker Role Cloud Service: We built specialized worker roles for each of our 3 APIs: hotel, transportation and car. Here is where our client’s request into each supplier request is transformed, sent to the supplier and transformed back to our client’s response format. If the message needs to be audited it gets sent to a Service Bus. When the worker receives a request or finishes handling it an event is triggered and sent to an Azure Queue for statistical purposes.

3. Audit (Worker) Role: Subscribes to the Audit Service Bus and saves messages in a storage when needed.

4. Azure Queues: Used to store statistical information later to be inserted in SQL database.

5. SQL Database: Stores statistical information about every request with its status and time consumed.

6. Service Bus: Used to store messages for audition purposes.

7. Blob Storage: Used to store audited XML messages exchanged with suppliers.

This solution is deployed in 2 datacenters balanced by Windows Azure Traffic Manager. Until now it has enabled us virtually unlimited scalability. We use Paraleap’s Azure Watch to scale our systems up and down according to CPU and many other performance counters. We are pushing Windows Azure to its limits by adding a lot of traffic and so far and after some tuning the response has been more than satisfactory.

The Figures

Peak Total CPU Cores

220

Total Suppliers

116

Average Requests Per Day

20.000.000 +

Average Bookings Per Day

5.000 +

Peak Requests Per Second

400 +

In production since

November 1st 2012