Collaborating with IIIT-H to combat abusive language through deep learning

Winner of the Best Poster Presentation at World Wide Web (WWW) 2017

What happens when machines become a little too human?  In an attempt to learn from observing human interactions, what if a machine picks up and mimics the less savoury elements of our species?

Misadventures in machine learning highlight an important problem, viz. the inability to distinguish between socially acceptable and inflammatory speech. Machines can not only be tricked into giving racist, sexist, and politically incorrect responses but can also be sabotaged to amplify controversial and hurtful sentiments that are commonplace online.

Inculcating social diplomacy into machines is therefore clearly one of the next big challenges in developing artificial intelligence. This is no mean task, given the inherent complexity of natural language constructs, different forms of hatred, diverse targets and various ways of representing the same meaning.

Algorithm to detect hate speech

Given a set of tweets, we started off a project to determine if they were sexist, racist or neither. I worked on it with IIIT-H student researchers Pinkesh Badjatiya and Shashank Gupta, and with Prof. Vasudeva Varma who heads the Information Retrieval and Extraction Lab at IIIT-H. We experimented with multiple deep learning architectures to detect hate speech on Twitter. A thorough analysis of a dataset comprising 16,000 annotated tweets showed that deep learning algorithms can outperform traditional char/word n-gram and logistic regression methods by a statistically significant margin.

The model was learned in a supervised learning setup. The algorithm learned by sifting through 16,000 tweets of which 3383 were labelled as “sexist”, 1972 as “racist”, and the rest as neither. The machines acquired the ability to differentiate by ‘learning’ from these examples, much the same way a human being does.

 

The team investigated the application of both traditional methods and deep neural network architectures for the task of hate speech detection and found them to significantly outperform existing methods. The traditional methods included signals like Term Frequency and Inverse Document Frequency (TF-IDF), and Bag of Words Vectors (BoWV).

Various kinds of deep learning architectures like Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs) and FastText were also tried for the task. These were in turn initialized using either random embeddings or embeddings from Global Vectors for Word Representations (GloVe). Finally, the learning was done using an end-to-end neural network or by using neural network output to train another supervised learning model.

The team found that LSTMs initialized using random embeddings when combined with gradient boosted decision trees led to best accuracy values of up to 93%.

The ground-breaking study was presented at the World Wide Web (WWW) 2017 conference in Perth, Australia this year and the team’s poster was voted the best poster presentation amongst 64 others.

Paving the way for a sanitized web

We believe that this research will pave the way for smarter, more self-aware computing. Current systems and techniques can detect hate speech with only 65% accuracy. Advanced algorithms based on this research can crawl the web and detect hate speech with much higher efficiency. Deep learning and supervised learning can help boost the accuracy of such detection machines.

Machines and chatbots can indeed be taught to be more civilized by advanced algorithms that filter and clean machine responses and interactions with human beings. Developing social etiquette for chatbots will ensure that the next Tay is a lot more self-aware and discerning.

Nevertheless, sophisticated hate speech detection goes way beyond creating polite robots to make the Internet safer for everyone.

Detecting hate speech can have a powerful effect on human interactions online as well. A smarter algorithm like the one we developed has the ability to mitigate problems such as cyberbullying, celebrity defamation, politically charged rhetoric, and abusive remarks online.

The tool can be integrated into other online platforms to help make online recommendations more culturally sensitive. For example, users on Yammer can be steered away from online groups that are deemed hateful, inappropriate, or abusive.

We believe the deep learning capabilities can be extended to a large number of applications on the web.  With more accurate detection, online platform moderators can reduce the level of hate and vilification online. Controversial topics can be moderated, children’s online experience can be sandboxed, and vulnerable communities can be better protected.

Conclusion

This project was an outcome of our vision for a safer, more inclusive web. The team’s success with the algorithm serves as an example of what can be accomplished when enterprise resources are combined with academic research.

For the past four years, I have worked with many IIIT-H students on various interesting machine learning projects leading to many collaborative publications at premier venues. Overall, at Microsoft, we’re committed to leverage collaborations with academia to propel computing and innovative technology research forward.