Research

My research spans two different, but overlapping areas: discourse analysis and computational linguistics. In discourse analysis, I study the mechanisms for coherence in discourse, focusing on how links across sentences produce the impression of coherence in text and speech. My discourse research is corpus-based and data-driven, and my contributions involve the creation of (annotated) datasets or corpora for analysis. In computational linguistics, I develop methods and algorithms to process and exploit discourse structure in different applications. In recent years, I have focused on two specific areas at the intersection of discourse and computational linguistics: discourse relations and sentiment analysis. This research is at the boundary between theoretical and applied fields, between the social sciences and engineering.

Current projects involve:

A project examining the nature of online news comments, those posted in response to a news article and, more specifically, to an opinion article. This is a corpus and computational project.
- Corpus: SOCC, the SFU Opinion and Comments Corpus. More than 630,000 comments posted on the site of The Globe and Mail. Publicly available.
- Computational work: a system to moderate comments, identifying constructive comments -- those that contribute to the conversation, creating civil dialogue.
A project on detecting fake news and misinformation. Work in progress includes:
- MisInfoText, a dataset of fact-checked news articles.
- A system to detect fake news online.
The Gender Gap Tracker, developed within our lab and in collaboration with Informed Opinions, monitors the proportion of women and men quoted in news stories in mainstream Canadian media (in English).

More information on these projects is available from the site of the Discourse Processing Lab.

Links to past projects:

In past work, I focused on how discourse can be coherent through belonging to a particular genre or register of discourse, and how participants work together to weave a coherent piece of discourse. I have concentrated on two different mechanisms that I consider are the two sides of the coherence coin: entity-based coherence and relational coherence. The former refers to the relations established among entities or referents in the text (i.e., anaphoric relations, such as the relationship between a pronoun and its antecedent), whereas the latter is conveyed through relations among propositions in the text (e.g., elaboration, summary, contrast, condition, cause). The two sides of this notion of coherence have been widely studied, as cohesion and coherence, or as referential coherence and macrostructures. My approach is to examine both in detail, aiming at integrating them in a unified theory of discourse, and with special emphasis on how they interact with different genres of discourse, and how we can model discourse for various computational applications.

My work on discourse in general is quite broad, encompassing a number of topics that all lead to the study of coherence. In brief, I have studied:

Genre and register, and different instantiations of it, such as task-oriented conversation or movie reviews
Turn taking
Information structure
Rhetorical and coherence relations
Cohesion and Centering Theory
Appraisal and subjectivity in text
Computational applications: modeling of conversation for speech applications, and detecting sentiment automatically

I have written articles on the information structure of discourse and Rhetorical Structure Theory. I have also applied cohesion analyses to a corpus of task-oriented conversations. My book Building Coherence and Cohesion (2004, John Benjamins) is a contrastive study of task-oriented conversations from a genre-based point of view, that is, a particular type of conversation as instance of a genre. I explain how the conversations come together as a single text, through the study of thematic patterns, rhetorical relations and cohesive relations.

Aproject on anaphora in English and Spanish used Centering Theory. We studied how subjects and topics are expressed in conversation, and how pronouns and other referring expressions are selected in discourse (i.e., why a full noun phrase might be chosen over a pronoun to refer to the same entity).

Within computational linguistics, I have worked on formal models of conversation structure, and the application of discourse processing in machine translation. I have also done some work on conversation policies for communication among software agents. Before coming to SFU, I worked for a company, where I led the design and implementation of a commercial natural language processing/artificial intelligence system.

My work on sentiment analysis led to a computational system to identify whether a text is positive or negative. The system, SO-CAL, or Semantic Orientation Calculator, extracts sentiment and opinion using words and phrases. SO-CAL is described in an article in the journal Computational Linguistics. Another aspect of this project involved the annotation of evaluative texts (mostly reviews posted online) following Appraisal Theory. The most recent work in this project is the collection of a corpus of opinion articles and news comments. We are analyzing them in terms of register, and trying to figure out whether they are constructive and/or toxic. More recent work is listed under the website of the Discourse Processing Lab.

I am part of a research group based at the Universidade de Santiago de Compostela, Spain, studying discourse typology, with María de los Ángeles Gómez-González as PI. Our group is called SCIMITAR (Santiago-Centred International Milieu for Interactional, Typological and Acquisitional Research). In Spain, I also collaborate with the CONTRANOT group, a group that studies functional linguistics and its applications, led by Julia Lavid at the Universidad Complutense de Madrid.

In 2010, I spent 8 months at the Department of Informatics of the University of Hamburg in Germany, collaborating with Christopher Habel on coherence relations in multimodal documents. My stay was supported by a Fellowship for Experienced Researchers from the Alexander von Humboldt Foundation.

Other interests include systemic functional linguistics, second language teaching and learning and Spanish linguistics.

I am an Associate Member in the School of Computing Science, and also Associate Member in the Cognitive Science Program at SFU.

I am in charge of the web site for Rhetorical Structure Theory, which also has a mailing list associated with it.

At SFU, I manage corpora acquisition from the Linguistic Data Consortium. We own a number of materials; you can find a list on the LDC at SFU web page. If you are a student, faculty or staff at SFU, you have access to them.

There are regular opportunities for research positions in my research group: as Research Assistant, for graduate or undergraduate students; and for graduate students to do their theses. If you are a student at SFU, or if you are considering graduate studies at SFU, and are interested in any of these areas, please get in touch with me about opportunities for research (mtaboada @ sfu.ca).

Interests

Discourse Analysis
Computational Linguistics
Corpus Linguistics
Rhetorical Structure Theory
Sentiment analysis
Systemic Functional Linguistics
Appraisal Theory
Social Media Language
News Comments
'Fake' news and misinformation

Maite Taboada

SFU

Research

Interests