faculty research spotlight
Maite Taboada presents text classification technique that detects toxicity in online comments
The research pursued by Distinguished SFU Professor and Royal Society of Canada Fellow Maite Taboada and her team at the Discourse Processing Lab at SFU Linguistics combines discourse analysis and computational linguistics, with an emphasis on discourse relations and sentiment analysis. Taboada's current work focuses on the analysis of online comments, drawing insights from corpus linguistics, computational linguistics, and big data. Other projects include a study of fake news online and the Gender Gap Tracker.
On October 23rd, Taboada delivered a talk at the Centre for Information Technology (CiTIUS) in Santiago de Compostela, Spain. As part of a joint initiative between the HYBRIDS project and IBERIFIER, CiTIUS brought together experts to explore the intersection of AI and disinformation. During her talk, Exploring constructiveness and toxicity in online comments, Taboada described how her lab developed a text classification technique that identifies instances of constructive comments.
The current landscape of online commentary is one rife with abusive comments, toxic behaviour, harassment, misinformation, and fake news. Taboada proposes that some of these problems can be solved with text classification techniques. If ‘nice’ comments can be distinguished from ‘nasty’ ones, then we can filter out the latter and promote civil online discourse. If we can reliably identify misinformation and fake news stories, then we can stop them before they spread online.
Taboada's Discourse Processing Lab is actively working on these issues with an ongoing research project that aims to identify constructive comments on news stories. Using both ‘classic’ machine learning (Support Vector Machines with linguistic features) and deep learning methods, Taboada and her team have built a classifier to identify instances of constructive comments, defined as those that are related to the article, intend to create a civil dialogue, and provide specific points supported by evidence.
The classifier was built using a large annotated corpus of comments (12,000 comments) from the Canadian daily publication The Globe and Mail. Results show that constructiveness can be identified reliably and that a mix of features characterize constructive comments. The features that reliably characterize contructive online comments include length, specific points, and the presence of personal stories.
The goal of Taboada's project is to build a moderation platform to allow constructive comments to be featured more prominently. Ultimately, featuring constructive comments and filtering out toxicity helps to encourage genuine, civil debate in online spaces.