This is a follow-up post to my first post on a recent project to model hate speech on Reddit. If you haven’t taken a look at my first post, please do!

I recently gave a talk on the technical, data science side of the project, describing not just the final result, but also the trajectory of the whole project: stumbling blocks, dead ends and all. Below is the slide deck: enjoy!

Abstract

Reddit is the one of the most popular discussion websites today, and is famously broad-minded in what it allows to be said on its forums: however, where there is free speech, there are invariably pockets of hate speech.

In this talk, I present a recent project to model hate speech on Reddit. In three acts, I chronicle the thought processes and stumbling blocks of the project, with each act applying a different form of machine learning: supervised learning, topic modelling and text clustering. I conclude with the current state of the project: a system that allows the modelling and summarization of entire subreddits, and possible future directions. Rest assured that both the talk and the slides have been scrubbed to be safe for work!

Slides

Modelling Hate Speech on Reddit - A Three-Act Play

Reddit is the one of the most popular discussion websites today, and is famously broad-minded in what it allows to be said on its forums: however, where there is free speech, there are invariably pockets of hate speech. In this talk, I present a recent project to model hate speech on Reddit.

Want to hear more from me?


Subscribe to my newsletter! My thoughts on what I'm reading and learning, delivered once a month.
More information here. Newsletter archive here.