I recently had a chance to catch up via e-mail with Ben Waine, winner of the 2011 PHP on Azure contest. The announcement of his victory was actually made at the Dutch PHP Conference in May, but we’ve both been extremely busy since then, so exchanging e-mails has taken a while. I only followed the contest from a distance while it was happening, but after hearing that Ben had won the contest (I had the good fortune of meeting Ben in person at the 2010 Dutch PHP Conference) and after reading his blog series about building his application, I wanted to find out more about his experience. He does a great job of detailing his experience on his blog, so I highly suggest reading his articles if you are looking to understand the benefits and challenges of running PHP application on the Windows Azure platform. If you are interested in learning more about Ben and his hind-sight perspective on his project, read on…
Brian: For our readers, can you introduce yourself? Who are you? What do you do?
Ben: I'm a software engineer at Sky Betting and Gaming in the UK. Previous to that I worked at SEO Agency Stickyeyes. During these two jobs I have also been completing my undergraduate degree in Computing at Leeds Beckett University. I'm a PHP developer focusing on back end development. I'm a little front end phobic!
Brian: How did you find out about the PHP on Azure contest? What piqued you interest in it?
Ben: I head about the competition at PHPBNL11 while interviewing Katrien De Graeve about her talk on Azure. I have to confess, it was initially the prospect of going to Vegas that piqued my interest. I knew I would have to produce a large body of work over the next few months as I had a dissertation to write. As I am quite comfortable with the Linux stack, I thought it would be interesting to produce an application on Azure, a platform I wouldn't normally develop for.
Brian: Nothing wrong with wanting to go to Vegas! (I want to go too…I’ve never been there!) Were you surprised to win? (Did you think you had a chance at winning?)
Ben: I thought I might have a chance of winning. A number of Microsoft competitions have been run in recent years, and while take up is usually high, so is competitor attrition. The competitions usually end with just a few (high quality) entries. But, I knew I'd have to complete the work or fail my degree!
I knew when the competition winner would be announced (at the end of DPC11), so I'd planned to look on Twitter at 6 pm but didn't account for the time difference between Leeds and Amsterdam. When I logged in there were already loads of messages of congratulations and support. It was a great feeling!
But, I was also quite surprised. The UI for my app was a little rough and ready, and an number of features I would have loved to include fell by the way side because I had to work on my dissertation. I only had time to produce the bare bones functionality, like harvesting data from Twitter and producing graphs and data tables (which were the most important aspects of the application).
Brian: Did the work you did on this application count towards your dissertation?
Ben: The brief set by my university was quite broad: “Produce a software project that does something”. Many people chose to develop e-commerce websites or CMS systems. As a developer doing those things all day, I wanted to try something a little different. It was great to work on a project two days a week that was totally different to the work I do for my employers. I submitted the entire software piece as the practical component of the dissertation and used the data it generated in the write up.
Brian: What gave you the idea for you Twitter Sentiment Engine (TSE)?
Ben: I've always liked Twitter, it's a really fast paced and relevant medium. With no experience in sentiment analysis, I thought my dissertation would be a great chance to learn about this exciting subject.
Brian: In retrospect, what were the biggest pain points in using PHP on the Azure Platform? How did you overcome them?
Ben: The biggest pain point for me on Azure was deployment. Knowing little about the platform, I first used the command line tools for PHP and packaged my project for deployment via the management portal. This process generally took around 40 minutes from start to finish. Compare this to the five minute automated deployments I'm used to and it's easy to see why deployment was frustrating at times. I plan to continue developing for Azure and have been looking into the Service Management API. My next project will be to create a build server on an Azure instance utilizing some custom phing tasks to allow quick deployment onto Azure from a VCS.
I think PHP on Azure is still in it's infancy. There is a lot of good technical content provided by Microsoft and early adopters, but there isn't yet the breadth of documentation PHP developers have come to expect. It was sometimes difficult to get answers to questions about things like application logging, deployment, and some of the tooling. However, I did always manage to figure it out in the end as Azure does have a nice community of developers around it.
Brian: What were the biggest (nice) surprises?
Ben: The nicest surprise was SQL Azure. It was easy to set up, integrates well with SQL Management Studio, and, from reading about it, seems to be extremely reliable. Coupled with the PDO_SQLSRV driver, it showed that SQL Server is definitely PHP ready. It was a great example of how cloud computing can 'just work', putting a familiar facade in front of redundant hardware.
As of v5.3, PHP is feature complete on Windows. It wasn't necessary to change the any of the preliminary work I had done on the project on a Linux platform. This is a great step for interoperability and definitely a nice surprise.
Brian: That’s great feedback…good to hear we are moving in the right direction. What advice would you give to PHP developers who are interested in the Azure platform?
Ben: Take some time to familiarize yourself with the various demo's and guides provided on MSDN. Writing PHP isn't the hard bit. You don't have to change anything in the way you write code. The hard bit is deploying successfully and consistently to the platform. Do the simple 'hello world' app and then go further and deploy the 'hello word' app of your chosen framework as well. This will give some of the knowledge required to deploy more complex real world applications.
Brian: Sounds like good advice. What are your plans for your Twitter Sentiment Engine going forward?
Ben: I've really enjoyed working on TSE, but I know it's not a complete project. It served it's purpose (gathering information for my dissertation) but there are so many features I'd like to write:
- Multi User Support
- User Trends (what other users are tracking / popular searches)
- A three tiered architecture based on message queues
- A nice UI
By the time Vegas comes round I'd like to have a full-featured application deployed on Azure. I've started to componentize the Bayesian filtering logic I used in the first iteration of TSE. It's available on Guthub: https://github.com/benwaine/BayesPHP. I plan to undergo a similar process with some of the other features, like Twitter sample gathering. Each of the components will be discreet and fully unit tested. The new version of the application will also feature unit and integration tests. Having produced a good prototype the next iteration will concentrate on quality, scalability and features.
Brian: Were you using Bayesian filtering to filter out Twitter spam?
Ben: The project used Bayesian classification to classify tweets into categories: positive and negative. The number of positive and negative tweets was then plotted onto a graph over time. The idea of the project was to capture real time changes in sentiment using a machine learning process that required no human interaction.
The lifecycle of a ‘keyword tracking request’ has two components: sampling & learning and classification. During the learning phase Twitter’s search API is used to harvest tweets containing the keyword specified and smiley phase glyphs like 🙂 or 🙁 . This gives the learning process a loose indication of the sentiment of the tweets. It uses these to learn the words that indicate positive or negative sentiment in the context of the keyword. In the classification phase of the life cycle, any tweet containing the keyword is classified into positive or negative based on the words that appeared in the learning phase.
Brian: That sounds very cool! Thanks Ben.
If you are interested in learning more about Ben’s Twitter Sentiment Engine and how he used the Azure platform, follow Ben on Twitter (but be careful what you say…he’ll be listening!) and subscribe to his blog. Ben also assured me that he welcomes contributions to his project on Github: https://github.com/benwaine/BayesPHP.