Posts, News, and Events

September 2025

Summer updates - Presentations

Presentation at the Canadian Linguistics Association - Montréal
Presentation at the Conference on Researching and Applying Metaphor - Detroit

September updates - New lab members!

Mychaela Blatta - M.A. student
Tamara Bodden - Postdoctoral fellow
Adam Podoxin - B.Sc. student
Sann Wilder - M.A. student

May 2025

We've had a very productive academic year! We've had 5 visitors, are presenting 2 papers this summer, and have annotated a corpus of metaphors. Eva Tchizmarova (SFU Linguistics) has also joined the lab.

Visiting students:

Sibilla Parlato, PhD student, Università Cattolica del Sacro Cuore, Italy
Sebastian Reimann, PhD student, Ruhr Universität Bochum, Germany

Visiting faculty:

Monika Bednarek, University of Sydney
Barbara Dancygier, University of British Columbia
Lise Fontaine, Université du Québec à Trois-Rivières

September 2024

New students join the lab!

Vanja Vekić Chen, MA student
Romina Hashemi, MA student
Jodie Lee, Undergraduate student - Undergraduate Student Research Assistant (USRA)
Amber Rynearson, MA student

June 2023

We have launched a French version of the Gender Gap Tracker! Radar de parité tracks news in six French-language media in Canada.

Soumah, V-G., P. Rao, P. Eibl and M. Taboada (2023) Radar de Parité: An NLP system to measure gender representation in French news stories. Proceedings of the 36th Canadian Conference on Artificial Intelligence. Montréal. Jun 2023.

June 2022

The lab has received funding from two agencies to continue our research!! 🥳

NSERC Discovery Grant "Natural language processing for detecting toxic, abusive, and hateful language online"
OBVIA Project Grant "Mind the Gap: représentation des femmes dans les médias québécois durant la pandémie COVID19". PI: Richard Khoury, Université Laval

June 2022

New funding from OBVIA to carry out research on women representation in Québec media during the Covid-19 pandemic. We look forward to a collaboration with colleagues at Université Laval!

February 2022

A summary of our contributions on the Gender Gap Tracker:
1. Dashboards and code:

Main dashboard with summary results: https://gendergaptracker.informedopinions.org/.
Research dashboard with text analyzer and topic modelling results: https://gendergaptracker.research.sfu.ca/
Code: https://github.com/sfu-discourse-lab/GenderGapTracker

2. Op-eds and commentary:

P. Rao, L. Chambers and M. Taboada. "Three years of monitoring gender representation in the media". Blog post. November 30, 2022.
P. Rao, M. Taboada and S. Graydon. "What we can learn from three years of data on the gender gap in news reporting". Poynter. October 28, 2021.
M. Taboada "The coronavirus pandemic increased the visibility of women in the media, but it’s not all good news". The Conversation. November 25, 2020.
M. Taboada and F. Torabi Asr. “Tracking the gender gap in Canadian media”. The Conversation. February 3, 2019. (Translated as "Big data para analizar la brecha de género en la prensa".)

3. Academic papers

Rao, P. and M. Taboada (2021) Gender bias in the news: A scalable topic modelling and visualization framework. Frontiers in Artificial Intelligence – Language and Computation 4(82). doi: 10.3389/frai.2021.664737.
Asr, F.T., M. Mazraeh, A. Lopes, V. Gautam, J. Gonzales, P. Rao and M. Taboada (2021) The Gender Gap Tracker: Using Natural Language Processing to measure gender bias in media. PLoS ONE 16(1): e0245533.

October 2021

Op-ed on the Gender Gap Tracker and its third birthday:

P. Rao, M. Taboada and S. Graydon. "What we can learn from three years of data on the gender gap in news reporting". Poynter. October 28, 2021.

And a more expanded blog post with more details, publications, and statistics:

P. Rao, L. Chambers and M. Taboada, "Three years of monitoring gender representation in the media". November 28, 2021. Discourse Processing Lab.

October 2021

Op-ed alert! In this piece for Policy Options, the magazine of the Institute for Research on Public Policy, Maite Taboada argues that online toxicity has become banal, something we do without thinking.

M. Taboada. "The banality of online toxicity". Policy Options. October 6, 2021.

September 2021

Op-ed on the language of fake news, in Items, the journal of the Social Sciences Research Council:

M. Taboada. “Authentic language in fake news”. Items – Insights from the Social Sciences. September 7, 2021.

August 2021

Our paper received the Test of Time Award from the Association from Computational Linguistics!!!

The paper: Taboada, M., J. Brooke, M. Tofiloski, K. Voll and M. Stede (2011) Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics 37 (2): 267-307.

The Test of Time Award recognizes "papers that have had long-lasting influence on the field of Natural Language Processing and Computational Linguistics."

Citation, from the video of the award ceremony: “This paper shows how a lexicon-based approach can be effective for sentiment analysis, and more importantly, also stable and portable across domains. Despite the current dominance of learning-based methods, lexicon-based methods for sentiment analysis keep being relevant, particularly in new domains where large training data isn’t available and where portability is crucial.”

June 2021

An update on our project about online news comments. Three more papers (#8, #9, and #10 below) on news comments and a summary of our findings:

Raw data
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2018) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. Simon Fraser University. DOI: 10.25314/71358419-06df-4b07-a160-7913737ca28f.
- GitHub page, with link to download the corpus: https://github.com/sfu-discourse-lab/SOCC
Paper describing the raw data (with small annotations)
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2020) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. Corpus Pragmatics 4: 155–190.
Annotated data (12,000 comments), in collaboration with Jigsaw
- Kolhatkar, V., Thain, N., Sorensen, J., Dixon, L., Taboada, M., 2020. C3: The Constructive Comments Corpus. Jigsaw and Simon Fraser University. Dataset. DOI: 10.25314/ea49062a-5cf6-4403-9918-539e15fd7b52.
Paper describing the large-scale annotation
- Kolhatkar, V., N. Thain, J. Sorensen, L. Dixon and M. Taboada (to appear) Classifying constructive comments. First Monday. Available on arXiv: https://arxiv.org/abs/2004.05476.
Register analysis: Are news comments like conversations? (tl;dr: NO)
- Ehret, K. and M. Taboada (2020) Are online news comments like face-to-face conversation? A multi-dimensional analysis of an emerging register. Register Studies 2(1): 1-36.
Subjectivity analysis: How complex are news comments vs. opinion articles? (tl;dr: it's complex)
- Ehret, K. and M. Taboada (2021) The interplay of complexity and subjectivity in opinionated discourse. Discourse Studies 23(2): 141-165.
Constructiveness and toxicity across 3 newspapers:
- Op-ed. Gautam, V. and M. Taboada. 2019. “Hey Tyee commenters! Scholars studied you. Here's what they found”. The Tyee.
NEW!!! Register analysis, again. If not like conversation, what are comments like? (Answer: a hybrid register):
- Ehret, K. and M. Taboada (2021) Characterising online news comments: A multi-dimensional cruise through online registers. Frontiers in Artificial Intelligence – Language and Computation 4(79): 10.3389/frai.2021.643770.
NEW!!! Appraisal analysis. Comments are very negative. They tend to express evaluation as Judgement or Appreciation (rather than Affect).
- Cavasso, L. and M. Taboada (2021) A corpus analysis of online news comments using the Appraisal framework. Journal of Corpora and Discourse Studies 4: 1-38.
NEW!!! Concessive relations in comments. Concessions have an interpersonal function and are used for evaluation and argumentation, especially in constructive comments.
- Gómez-González, MLA and M. Taboada (2021) Concession strategies in online newspaper comments . Journal of Pragmatics 174: 96-116.

We have learned a lot about online news comments. Mostly, that they are very complex and more like essays than casual conversation.

May 2021

January 2021

We have been working for almost 3 years now on a project analyzing the gender gap in Canadian media. We have created a summary dashboard with overall statistics and a research dashboard analyzing topics and top-quoted sources. We can also now share the great news that a research paper on the Gender Gap Tracker has been published!

Asr, F.T., M. Mazraeh, A. Lopes, V. Gautam, J. Gonzales, P. Rao and M. Taboada (2021) The Gender Gap Tracker: Using Natural Language Processing to measure gender bias in media. PLoS ONE 16(1): e0245533.

Our findings:

In 2 years of Canadian news media, the percentage of women quoted is regularly below 30%
Women authors quote more women
Politicians dominate in the news
NLP can help us find these patterns in data

Paper on the #GenderGapTracker in @PLOSONE with the fab team in the Discourse Processing Lab:@drftasr @MohMaz_ @aleaugusto_ti @VasundharaNLP Junette Gonzales @tech_optimist @InformedOps @SFULinguistics @SFUResearch @SFU https://t.co/l4nLP1sdSm
— Maite Taboada (@maite_taboada) January 29, 2021

November 2020

Op-ed on women quoted during COVID-19:

The coronavirus pandemic increased the visibility of women in the media, but it’s not all good news. The Conversation. November 25, 2020.

November 2020

Op-ed by Lucas Chambers and Maite Taboada on media coverage of elections:

Who is quoted and who is elected? Media coverage of political candidates. Canadian Science Policy Centre. November 3, 2020.

September 2020

The lab has been busy analyzing news comments. Here, all in one place, are the papers and the data that we have produced:

Raw data
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2018) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. Simon Fraser University. DOI: 10.25314/71358419-06df-4b07-a160-7913737ca28f.
- GitHub page, with link to download the corpus: https://github.com/sfu-discourse-lab/SOCC
Paper describing the raw data (with small annotations)
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2020) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. Corpus Pragmatics 4: 155–190.
Annotated data
- Kolhatkar, V., Thain, N., Sorensen, J., Dixon, L., Taboada, M., 2020. C3: The Constructive Comments Corpus. Jigsaw and Simon Fraser University. Dataset. DOI: 10.25314/ea49062a-5cf6-4403-9918-539e15fd7b52.
Paper describing the large-scale annotation
- Kolhatkar, V., N. Thain, J. Sorensen, L. Dixon and M. Taboada (to appear) Classifying constructive comments. First Monday. Available on arXiv: https://arxiv.org/abs/2004.05476.
Register analysis: Are news comments like conversations? (tl;dr: NO)
- Ehret, K. and M. Taboada (2020) Are online news comments like face-to-face conversation? A multi-dimensional analysis of an emerging register. Register Studies 2(1): 1-36.
Subjectivity analysis: How complex are news comments vs. opinion articles? (tl;dr: it's complex)
- Ehret, K. and M. Taboada (to appear) The interplay of complexity and subjectivity in opinionated discourse. Discourse Studies.

March 2020

Fatemeh Torabi Asr just won the SFU President’s Emerging Thought Leader Newsmaker of the Year Award for 2019! We are so proud of her!

You can read the SFU press release about her award, and the story she wrote about her research on misinformation and fake news, which has reached almost 69,000 reads.

Congratulations, Fatemeh!

March 2020

Maite talked about the Gender Gap Tracker at the Women in Data Science Conference in Vancouver.

December 2019

Maite was featured in the WWEST "Best of the West" podcast. WWEST is the Westcoast Women in Engineering, Science and Technology, an organization devoted to promoting increased participation of women in STEM.

November 2019

We analyzed more than 1.5 million comments from 3 news organizations. We found more constructive comments than we expected, that toxicity happens equally across topics and that some news outlets have better commenters than others. The full story:
- Gautam, V. and M. Taboada (2019) Constructiveness and Toxicity in Online News Comments. Report. Simon Fraser University. November 2019.

We also published a short version as an op-ed for The Tyee:

- Gautam, V. and M. Taboada. Hey Tyee Commenters! Scholars Studied You. Here's What They Found. The Tyee (online). November 6, 2019.

November 2019

SOCC, the SFU Opinion and Comments Corpus, has been available online for a while. Now the paper describing it is also online:
- Kolhatkar, V.,H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (to appear) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. Corpus Pragmatics.

September 2019

Maite is participating in a project that studies online abuse against candidates in the 2019 Canadian federal election, with Heidi Tworek from the University of British Columbia as principal investigator:

Trolls on the Campaign Trail: How Candidates Experience and Respond to Online Abuse

Preliminary results of the analysis will be made available early in 2020.

August 2019

Discourse Processing Lab postdoc Fatemeh Torabi Asr publishes an article on the lab's fake news research

Discussed in this Guardian article
Reprinted by PBS, by Nieman Lab, by Salon, by AlterNet, by The Halifax Chronicle Herald.

June 2019

Maite discusses the Gender Gap Tracker in Spanish:

Article and radio interview from Radio Canadá Internacional
Article in The Conversation España

May 2019

Paper on data quality for misinformation detection now available, open access:

Asr, F.T. and M. Taboada (2019) Big data and quality data for fake news and misinformation detection. Big Data & Society January-June 2019:1-14.

Paper too long to read? There's a video abstract!

March 2019

The lab participated again in the BCTech Summit, showcasing the Gender Gap Tracker.

See a summary of SFU's presence at the Summit.

February 2019

We launched the Gender Gap Tracker in Ottawa!

Maite attended this fabulous event with Maryam Monsef, Minister for Women and Gender Equality and Dr. Joy Johnson, SFU's VP Research, hosted by Shari Graydon of Informed Opinions.

The Gender Gap Tracker, developed within our lab with Informed Opinions, monitors the proportion of women and men quoted in news stories in mainstream Canadian media (in English).

We are having great media coverage for this event. You can check the Conversation piece for an explanation of the tracker, and see other media mentions:
- Tracking the gender gap in Canadian media (The Conversation)
- Online tool gives media outlets incentive to achieve gender parity (The Toronto Star)
- SFU partners with Informed Opinions to create Gender Gap Tracker (SFU News)
- In numbers there is strength (Ottawa Citizen and Montreal Gazette)

January 2019

A short video about Maite's research, especially on sentiment analysis.

Fall 2018

We are pleased to welcome two visiting professors to our lab. María de los Ángeles Gómez-González from the University of Santiago de Compostela will be visiting July-December 2018. Cliff Goddard from Griffith University will be at SFU October-November 2018. Welcome!

July 31, 2018

Emilie Francis successfully defended her M.A. Thesis: "Misinfowars: A Linguistic Analysis of Deceptive and Credible News". Congratulations!

June 22, 2018

The Discourse Lab is hosting a talk by visiting researcher Maite Martín.

Title: Affective and Social Computing in Spanish using Human Language Technologies
Speaker: Maite Martín, Universidad de Jaén (Spain)
When: Friday, June 22, 1 pm
Where: RCB 7402

Abstract: In this talk I will present some past projects and work in progress in which my research group SINAI (Sistemas INteligentes de Acceso a la Información – Intelligent systems for information access) is involved. Our area of specialization focuses on the development of techniques and tools to solve problems related to Human Language Technologies (HLT). I will briefly discuss our research oriented to Information Retrieval Systems (IRS) mainly in the biomedical domain. We are integrating heterogenous sources of medical and general information (UMLS, Google, SciELO, Dbpedia…) in order to improve the final IRS. I will also highlight the work we have done in the field of affective computing, mainly focused on Spanish and on the social web. Although lot of work has been already done in opinion mining, we think the real challenge is to recognize and analyse emotion expressed in textual documents. Finally, I describe future projects related to early detection of mental health problems (depression, anxiety, cyberbullying…) by analysing the textual information written in social networks. I will show some demos implemented by SINAI.

Short Bio
Dr. Maite Martín is Associate Professor in the Computer Science department of the University of Jaén (Spain). She received her Master's degree in Computer Science at the University of Granada, and her PhD in Computer Science at the University of Málaga. She has been teaching different courses at the University since 1995. She has been a member of the research group SINAI (Sistemas INteligentes de Acceso a la Información – Intelligent systems for information access) since 2000. Her scientific interests include several areas related to Human Language Technologies such as Information Retrieval, Machine Learning, Text Mining and Sentiment Analysis. She has been a member of programme committees of several international and national conferences. In addition, she has participated in more than 30 national research projects serving as lead researcher in some of them. She has published more than a hundred conference papers, journal papers, books and book chapters. Martín is the current treasurer of the Spanish Society of Natural Language Processing (SEPLN – Sociedad Española para el Procesamiento del Lenguaje Natural). She is editor of a number of issues of the journal Procesamiento de Lenguaje Natural (Natural Language Processing). She has also been an invited speaker at several conferences.

May 2018

We participated in the BCTech Summit! We did demos of our two systems, content moderation and fake news detection. You can try them too!

May 2018

A couple of interviews on trolls and social media:

CKNW in Vancouver, Charles Adler Tonight.
CHQR in Calgary, Calgary Today with Angela Kokoot.

May 2018

Article in The Conversation about our research on online news comments. Trolls, toxicity and construtive conversations.

May 2018

Maite is part of a panel discussing the documentary The Cleaners, about content moderation in social media.

April 2018

SFU News story on our research.

February 2018

Jon Alkorta is a visiting PhD researcher from the University of the Basque Country in Spain. He will be in the lab between February and May, doing research on rhetorical relations and sentiment in Basque.

January 2018

We have just released the SFU Opinion and Comments Corpus (SOCC), a corpus for the analysis of online news comments. Our corpus contains comments and the articles from which the comments originated. The articles are all opinion articles, not hard news articles. The corpus is larger than any other currently available comments corpora, and has been collected with attention to preserving reply structures and other metadata. In addition to the raw corpus, we also present annotations for four different phenomena: constructiveness, toxicity, negation and its scope, and appraisal.

Full details, and download link, are available from our GitHub project page: https://github.com/sfu-discourse-lab/SOCC

For more information about this work, please see our papers.

Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2018) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. Journal paper under review.
Kolhatkar. V. and M. Taboada (2017) Using New York Times Picks to identify constructive comments. Proceedings of the Workshop Natural Language Processing Meets Journalism, Conference on Empirical Methods in Natural Language Processing. Copenhagen. September 2017.
Kolhatkar, V. and M. Taboada (2017) Constructive language in news comments. Proceedings of the 1st Abusive Language Online Workshop, 55th Annual Meeting of the Association for Computational Linguistics. Vancouver. August 2017, pp. 11-17.

Contact:
Varada Kolhatkar (vkolhatk@sfu.ca)
Maite Taboada (mtaboada@sfu.ca)

November 15, 2017

Our postdoctoral researcher, Katharina Ehret, has been featured in an article on the Faculty of Arts and Social Sciences webpage.

Postdoctoral researcher in linguistics, Katharina Ehret, studies why online comments matter

October 18, 2017

Visiting Researcher

Dr. Cliff Goddard from Griffith University in Australia is visiting the Discourse Lab between October 18 and November 10. Dr. Goddard is a long-time collaborator, and is here thanks to an SFU-Griffith Collaborative Travel Grant.

September 2017

The lab has grown! We have two new undergraduate students, a new master's student, and two new postdocs. It'll be a busy semester!

June 30, 2017

Speaker: Muhammad Abdul-Mageed, Assistant Professor of Information Science in the iSchool at UBC.

Abstract: Accurate detection of emotion from natural language has applications ranging from building emotional chatbots to better understanding individuals and their lives. However, progress on emotion detection has been hampered by the absence of large labeled datasets. In this work, we build a very large dataset for fine-grained emotions and develop deep learning models on it. We achieve a new state-of-the-art on 24 fine-grained types of emotions (with an average accuracy of 87.58%). We also extend the task beyond emotion types to model Robert Plutchik’s 8 primary emotion dimensions, acquiring a superior accuracy of 95.68%.

May 2017

Sonya Chik will be a visiting PhD researcher in the lab until August. She is conducting cross-linguistic research on socio-semiotic processes in privacy policies, using a systemic-functional lingusitics approach.

February 24, 2017

Presentation on Spark:

-- MapReduce
-- Spark dataframe udf
-- search engine, Spark GraphFrame
-- Spark MLLIB, Scikit Learn
-- Spark pipeline with coreNLP

Installation instructions for WebAnno

January 20, 2017

Speaker: Enamul Hoque

Abstract: Analyzing and gaining insights from a large amount of online conversations can be quite challenging for a user, especially when the discussions become very long. During my doctoral research, I have focused on integrating Information Visualization (InfoVis) with Natural Language Processing (NLP) techniques to better support the user’s task of exploring and analyzing conversations. For this purpose, I have designed a visual text analytics system that supports the user exploration, starting from a possibly large set of conversations, then narrowing down to a subset of conversations, and eventually drilling-down to a set of comments of one conversation. Our evaluations through case studies with domain experts and a formal user study with regular blog readers illustrate the potential benefits of our approach, when compared to a traditional blog reading interface.