Guest post by the Microsoft Cognitive Challenge Winners of Imperial Hack 18 team Spidentify
Third Year, Computer Science MSci
King’s College London
Third Year, Computer Science BSc
King’s College London
Third Year, Computing MEng
Imperial College London
Fourth Year, Computing MEng
Imperial College London
Second Year, Physics MSci
Imperial College London
Running for its seventh consecutive year, over 300 of the UK’s most creative and talented students to London to hack on any project that takes their fancy for 24 hours.
Programming and design come together to build the technology of the future.
This year, IC Hack 18 was hosted at Imperial College London's South Kensington campus on the 27th and the 28th of January 2018, ICHack brought together, programmers, designers and more from universities for a 24 hour hackathon, where the hackers will break and innovate to produce projects that push the boundaries of technology.
Learn more about this in their official website at IC Hack 18.
Microsoft were looking for innovative applications of this technology the focus will around and the Vision API sets of cognitive services.
Challenge: The Microsoft challenge will focus on the Cognitive Services API. We will be looking for innovative applications of this technology the focus will around and the Vision API sets of cognitive services. AI, Data and pervasive technology which are driving the future. The team solutions will need to a built using Microsoft Cognitive Services. Cognitive Services are a free, easy-to-use, open-source SDK and services and even export models to run offline, starting with export to the CoreML format for iOS 11
The use of the following APIs will be considered:
- Computer Vision API Distill actionable information from images
- Face API Detect, identify, analyse, organise and tag faces in photos
- Content moderator Automated image, text and video moderation
- Emotion API PREVIEW Personalise user experiences with emotion recognition
- Custom Vision Service PREVIEW Easily customise your own state-of-the-art computer vision models for your unique use case
- Video Indexer PREVIEW Unlock video insights
Our application, Spidentify, was a proof of concept which allowed us to differentiate between different venomous creature and their harmless doppelgangers.
After arriving at ICHack18, the annual Imperial College hackathon, and sitting through the opening ceremony, we attended the team forming session. Being from different universities and different courses we did not all know each other, but after a short introduction and a pitch of Jonny’s idea we decided to work together on Spidentify. We were a team with a varied skillset and experience, ranging from our resident Hackathon veteran, Oliver, to a physicist for whom ICHack18 was a first, Muhammad.
Spidentify has a straightforward concept: take a photo of a creature and let a computer tell you whether it is venomous or not. The initial idea was formed by Jonny, who discovered a spider while living in halls in the US. Unable to identify the spider, the room had to be evacuated (read; he ran from the room, screaming). Given the advancements and increased access to Machine Learning and Computer Vision algorithms, he thought it should be possible to automate the identification process to give guidance as to what the species of a spider might be, avoiding unnecessary room evacuations.
We decided that the easiest way to build a working project was to create a web application. After putting some thought into how to approach solving the problem, four of us attended the Microsoft Cognitive Vision presentation held soon after the hacking had officially began. Meanwhile, Sam stayed to build a mock-up front-end, and brainstorm ideas. The Microsoft presentation and subsequent demos impressed us all. We were particularly interested in the Custom Vision API with its ease-of-use, accuracy, and flexibility. Soon after, we started exploring it by training different models hoping to cope with identifying numerous, very similar types of creatures.
Throughout the hackathon, we were well fed and looked after; all sponsors were approachable and our peers were accommodating. Hacking continued through lunch, dinner, and midnight pizza (and multiple snack breaks in between). At 2am, three team members returned home to catch a few hours of sleep after a hard day’s work. The reasoning behind this decision was that two of us could continue making progress throughout the night, while the sleeping shift would ensure fewer mistakes were made upon their return at 7am.
When the sleepers arrived back in time for breakfast, the hackathon was nearing its end. The front-end team had linked their work with the back-end and due to the easy deployment of the Custom Vision API, everything worked together well. Final touches were made to the Devpost submission, tutorial videos were recorded, and the project was submitted in time for the 12pm deadline.
During lunch, various judges circled around the hackspace, while groups presented their ideas and demos. We were amazed by the interest generated for Spidentify not only from the judges, but the other hackathon participants as well. There were many impressive projects that were completed, such as an emotion-based mood lighting system and a Kinect-based gaming platform, that we enjoyed watching.
Being short on time, quick Wikipedia searches on each category of animal (phylum) had to suffice. We decided to focus on 3 phyla; snakes, scorpions and spiders. There was enough information for us to mentally categorize these phyla based on the biological differences that were used by biologists to classify them as different genera. For example, a scorpion is differentiated from other arthropods because of its characteristically long tail; snakes are reptiles, and are classified as such based on their skin covered with overlapping scales. Although this is easy for humans to identify, it is more of a challenge to train a computer-based algorithm to recognise these differences.
Every genus we looked into had a multitude of species within it, so we decided to incorporate at least 3 from each category. We particularly sought out harmless species, as well as harmful venomous creatures, as part of our proof of concept plan for the program.
We used the Microsoft Custom Vision API for the identification of venomous creature and their counterparts in our application. We were initially struggling for ideas on how this could be made to work. However, within six hours, we had four working classifiers which were fine tuned.
Due to the limited number of images we could upload for each specie (1000 images in total) are available for customvision.ai preview services, and to reduce the possibility of overfitting, we chose to limit the sample size to less than 30 images per class. We found we could achieve reasonable accuracy using this sample size, especially when distinguishing between similarly looking species. When we experienced unexpected drops in accuracy, we looked for features that were being misidentified, and provided the neural network with a counterexample that would avoid fitting that feature.
An example of this was that the majority of Black Widow spiders were pictured against the sky, as they are often found on their webs, while the Brazilian Wandering spiders were often pictured against their natural habitat; leaf litter on a forest floor. To fine tune the model, we had to find examples of the Brazilian Wandering spider against the sky, and a Black Widow against leaves to train the neural network to distinguish between the spiders, rather than between backgrounds.
Even after limiting the sample size, we found it was not possible to represent all species we wanted to classify in a single project, due to the hard limit of 1000 images and 50 classifiers in each Custom Vision project. To overcome this, we decided to design a hierarchical classification system, whereby images are initially classified into species (e.g. snake, scorpion, spider). We then passed the image through the specific classifier for the type of creature. A disadvantage of our approach was that there was no backtracking; if the higher level classifier falsely classified an image, the low level classifier would still classify the image with 0% confidence. Taking this project forward however, it would be important to tackle this issue. This could be resolved with an algorithm whereby if a low level classifier fails to classify the creature, the algorithm could consult the next likely low level classifier until a match is found.
Our back end was a fairly simple set up with a node.js server, and an Azure SQL database. This was because most of the main processing was done using Custom Vision and only the presentation of information was done by the front end.
The main purpose of the back end was to store and retrieve information in the database. This would include species information such as the possible dangers of a bite and its threat level, as well as how many people had recorded being bitten by the particular species.
We chose both node.js and SQL as a couple of team members were familiar with node.js, and we had all used SQL at some point. We used Azure to host the database partly because we thought it be cool to include it in a project already using a variety of Microsoft tools, but mainly because we found it intuitive, easy to use, and easy to incorporate into our project.
We decided to build our front-end using React. Since it is build with a modular structure in mind, it allowed us to reuse components built by other people to ease our workload and increase our productivity.
Initially, when connecting our React front-end to the Microsoft Custom Vision API, the component we used was base64 encoding the image we were submitting to the API, which was incompatible with the API. After extensive testing, and two hours of head-scratching, the component was changed, resolving the issue.
The front end was built incrementally, while the models where being trained. As such, in the first drafts it did not include any pictures except the ones being uploaded by the user. We decided that after a classification was made we should present pictures of the creature in question so that the user could cross-check whether the culprit looks the same, and visually validate the result the webapp provides. In order to do this, we employed Bing Image Search API v7. It allowed us to search for keywords that we associated with the tags of our classifiers and then provided us with image URLs that we could embed in our implementation, thus displaying them to the user.
We were impressed at how easy it was to use Bing’s API, which enabled us to quickly obtain a key, allowing us to access the API. After that, it was a matter of passing the correct argument, obtained from the Custom Vision classification and, upon receiving a response, parsing it correctly. As a last stage before the closing of ICHack18, we decided that it might be a good idea to include a further help information that a concerned user could access. Thus, if a user was bitten and they indicated it, they would be taken to a tab with a map displaying the nearest A&E.
The Final Presentation
Soon, it was time for the final presentations. As the auditorium filled, we had no idea that we would go on to present. Nervously sitting in a line, while other teams presented, we were once again struck by the quality and complexity of the projects that were completed. The presentations were well honed and the pride that presenters took in their projects was clear.
When Spidentify was shortlisted for the Microsoft Cognitive Services challenge, we were surprised, having done no preparation for a presentation to an auditorium so large, we nervously walked to the front and briefly decided who would talk and what to say. As the group before us finished, we took to the stage and delivered our presentation.
Then, came the tense wait to discover who had won. We were very happy with what we had achieved up to that point, even presenting was an achievement for us -- most of our team had never been in such a situation before, and we did not expect to win at all. When our team name was announced, we were dumbfounded. Shocked looks were exchanged, replaced by big smiles as we walked down the stairs. An experience none of us will easily forget!
There are numerous points that we would like to touch upon looking forward to possible further development.
We would need to consider how to scale up training the models, which carry out the classification. While we managed to train four models to prove our concept, it is a time demanding task, possibly requiring an expert who can distinguish between venomous creatures and their harmless counterparts.
We considered coming up with a mobile application or mobile friendlier version of the already existing application so that it is more practical. Again, while our project was a proof of concept, it would be impractical to require a distressed person to upload a picture they took with a phone to their computer in order to analyze it.
Considering a way to overcome blurry images or images in which the creature is not the predominant part is crucial. This would be a very important next step, because we doubt a user, in their distress, would like to get too close to a critter in order to take a perfect photo.
Our help information could be localized, such that instead of providing the user with a map of the nearest hospitals, it could find and dial numbers for local hospitals and emergency services immediately.
Lastly, a different version of our application could be used for educational purposes. For example, it could be used by teachers for outdoor classes to allow children learn more about different creatures in their garden in an interactive way.
What we’ve learnt about ourselves post hackathon
Competing in a hackathon is always a learning experience as you push yourself to limits of what you can potentially achieve in a small time-frame. We found that working under a strict time-limit effectively forced us to prioritize and focus on the fundamentals.
We decided we had to make our plan as lean as possible, cutting out the flab, to increase our chances of making something that worked. Similarly, we also shared the workload in a minimalist fashion with each other noting that after we had a basic skeleton of what we wanted ready, we could always garnish it with extra extensions.
In retrospect, we found that this strategy worked out well for us. We accepted the limitations in terms of achievability that the time-frame imposed on us and were reasonable in setting targets. So, the hackathon taught us quite a bit on prioritizing: which is a handy skill to have in professional situations.
Also, teamwork and good inter-team communication played a huge part in our ability to finish the program without many hiccups along the way. Since we had decided to split the training phase amongst 4 members whilst one worked on front-end, it was extremely crucial to categorize training so that we could integrate it together into one program at the end. This meant that we all knew exactly what everybody was up to during the training phase. And it was only possible through regular updates everybody had to give to the entire team - especially if something was being done unexpectedly differently.
Open idea-sharing and objections were also what made the process of program-building as a team seamless for us. We found that sharing our thoughts on improvements/extensions from the get-go allowed us to get better insight into what we all wanted collectively as a team to achieve in the end.
This allowed us superpose inputs from the entire team all the way so that no one was particularly unhappy about what we made in the end. This enabled us to enjoy the whole process a lot more and we would all love to collaborate on extending this project or make a new one from scratch in the future.
All the photos are available in their official IC Hack 18 Facebook Album.
If your interested in learning more about Microsoft Cognitive Services and Custom Vision there are some excellent workshop at https://github.com/MSFTImagine/computerscience/tree/master/Workshop/
If your interested in doing some further learning see the courses at http://aischool.microsoft.com