Data, data everywhere, but what’s the use?
There are great efforts across many research disciplines to make data available for reuse, and environmental science is one area in which this is very successful. But how do you make all this data discoverable, usable and actionable? Dozens of scientists, developers, and business people were wowed by the spectacular view at the Digital Catapult Centre in London as they gathered to see if they could figure out how to unleash the potential of data at EnviroHack2015.
There were plenty of datasets to choose from, including those of Ordnance Survey, Met Office, Environment Agency, UK Data Service, British Oceanographic Data Centre, Centre for Ecology and Hydrology, British Atmospheric Data Centre, and National Biodiversity Network - including 100 years of plant and insect species data. We at Microsoft Research provided several datasets through our FetchClimate service, developed by our Computational Science Lab. Our partners Shoothill also provided FloodAlerts and Gaugemap flood information and APIs.
We were very excited to be taking part, with Matthew Smith from our Cambridge (UK) lab getting stuck in with several teams brainstorming how to link up and make best use of data. Our dynamic data science duo from Microsoft UK, Amy Nicholson and Andrew ‘@deepfat’ Fryer, enthralled the audience with a whistle-stop tour of Azure Machine Learning and stimulated lots of discussion on how this could be used in action.
After lots of talking and planning, the hackathon teams got to work in earnest, driven by pizza and adrenalin! The Solar Checker team had a great idea to help people make the most of their solar panel installations. Tom August, from the Centre for Ecology and Hydrology explains, “Our goal was to build an app that helps home owners with solar panels to make the most of the green energy that they produce. The current problem is that you may not know when it is best to switch on your dishwasher or washing machine to make the most of the electricity you are producing. To be able to make this decision users need to be able to predict the power generation hours in advance, something that is not currently possible.”
The team took the bit between their teeth and decided to try and build a predictive model, something that would usually take days to weeks to put together. Tom continues, “We used simulated data and Azure ML to help solve this problem. Using our simulated data of weather from the Met Office and power output from our solar panel we were able to quickly build a workflow in Azure ML that read in our data and trained a model to predict the power produced by the solar panel simply using the day of the year, time of day and cloud cover. At first our model performed pretty badly. We used a linear regression model and since the effect of time of day and day of year on the amount of sun was non-linear (it had a sin-wave form), the model couldn’t estimate our power outage well.”
Not quite the auspicious start the team hoped for, but they persevered, and Azure ML really started to come into its own. “Next we tried a neural network model, and plugged this into our existing workflow. Using the evaluation module we were able to view the results of the models side by side. Being an R programmer I wanted to plot the results to get a handle on how good the model was, this was really easy, I just dragged over the R module, plugged it in, and within a minute or two I had the plot I wanted.”
“The final model seemed to accurately predict the power output even given the random noise I had added to the simulated data and we published the model as an API. Our app then used up-to-date weather predictions from the Met Office to predict power output over the next 12 hours. We hope that these sorts of systems will exist in smart homes of the future; allowing smart appliances to to schedule their jobs to make the most of the energy the house is producing.”
Kudos to Tom and the team for coming up with such a smart predictive app in such a short time, and it was great to see them win the ‘Advanced Analytics’ and ‘Best R Solution’ Awards! There are great prospects for some of the other teams to take forwards Azure Machine Learning such as recommending scientific linked datasets to users, and the overall winning Jelly Swarm project to predict jellyfish blooms off the coast of the UK and globally. And we hope that you find it as easy as Tom did for your own work - “Azure ML was easy and intuitive to use and allowed us to mock up a work flow, and build a model, in a couple of hours.”