Research

Computational analysis of text sentiment

This project addressed the problem of automatically extracting sentiment, or subjective content, from any given text. The objective of the project is, first of all, to establish the best methods and algorithms for extracting sentiment from texts, and, second, to implement a system that can perform sentiment classification automatically in a large corpus.

Sentiment is defined as subjective content, expressed in whether a text contains positive or negative views and opinions towards its subject matter (e.g., an opinion piece in a newspaper, a movie review, a report on a new product, an e-mail message, or a post on a bulletin board). The hypothesis is that, given a text, we can determine whether it contains subjective material, and, if it does, we can determine its positive or negative sentiment by parsing its discourse structure.

Part of the project includes building a discourse parser. The first stage is a discourse segmentation tool, which you can download from the SLSeg page.

The most detailed description of the project is in the Computational Linguistics paper below. An informal description in this presentation:

Taboada, M. (2007) Thumbs Up or Thumbs Down? Detecting Sentiment and Opinion Automatically. Presented at the Defining Cognitive Science Speaker Series. SFU. November 2007. Presentation slides (pdf) and a report on the presentation.
Taboada, M., J. Brooke, M. Tofiloski, K. Voll and M. Stede (2011) Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics 37 (2): 267-307.

Related outputs were corpus coding using Appraisal Theory, and the SFU Review Corpus. The system, SO-CAL, is available from GitHub:

Download SO-CAL from GitHub.

Funding:

Natural Sciences and Engineering Research Council of Canada (NSERC)
Discovery Grant "A computational treatment of negation and speculation in natural language" (2015-2020)
Discovery Grant "Discourse parsing for summarization and sentiment detection" (2008-2014)
Discovery Grant "Computational analysis of text sentiment" (2003-2008)
Also funded through an NSERC University Faculty Award (2004-2010)

Participants, present and past:

Rada Trnavac (Postdoc), Jennifer Hinnell (M.A. student), Ashleigh Gonzales (M.A. student), Dennis Sharkey (M.A. student), Debopam Das (Ph.D. student), Nicola Bergen (B.A. student), Mathieu Dovan (B.A. student), Sam Al Khatib (M. A. student), Vita Markman (Assistant Professor), Milan Tofiloski (M.Sc. student), Julian Brooke (M.A. student), Patrick Larrivee-Woods (B.Sc. student), K. Montana Hay (B.A. student), Kim Voll (Ph.D. student), Caroline Anthony (B.Sc. student), Jack Grieve (M.A. student), Dennis Storoshenko (M.A. student), Katia Dilkina (B.Sc. student).

Publications, reports and manuals — please see publications page for more related publications

Taboada, M., J. Brooke, M. Tofiloski, K. Voll and M. Stede (2011) Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics 37 (2): 267-307.
Taboada, M., J. Brooke and M. Stede (2009) Genre-Based Paragraph Classification for Sentiment Analysis. In Proceedings of 10th Annual SIGDIAL Conference on Discourse and Dialogue. London, UK. September 2009. pp. 62-70.
Brooke, J., M. Tofiloski and M. Taboada (2009) Cross-Linguistic Sentiment Analysis: From English to Spanish. In Proceedings of RANLP 2009, Recent Advances in Natural Language Processing. Borovets, Bulgaria. September 2009. -- Poster
Tofiloski, M., J. Brooke and M. Taboada (2009) A Syntactic and Lexical-Based Discourse Segmenter. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. Singapore, August 2009. pp. 77-80. -- Poster
Brooke, J. (2009) A Semantic Approach to Automated Text Sentiment Analysis. Master's Thesis. Department of Linguistics. Simon Fraser University.
Stede, M., M. Taboada and J. Brooke (2008) Movie Stages Annotation Manual. Guidelines for annotating functional zones or stages in movie reviews. Universität Potsdam and Simon Fraser University.
Taboada, M., Kimberly Voll and Julian Brooke (2008) Extracting Sentiment as a Function of Discourse Structure and Topicality. School of Computing Science Technical Report 2008-20.
Voll, K. and M. Taboada (2007) Not All Words are Created Equal: Extracting Semantic Orientation as a Function of Adjective Relevance. In Proceedings of the 20th Australian Joint Conference on Artificial Intelligence. Gold Coast, Australia. December 2007. pp. 337-346. Paper in pdf format.
Taboada, M., C. Anthony and K. Voll (2006) Methods for Creating Semantic Orientation Dictionaries. Proceedings of 5th International Conference on Language Resources and Evaluation (LREC). Genoa, Italy. May 2006. pp. 427-432. Paper in pdf format.
Taboada, M. and J. Grieve (2004) Analyzing Appraisal Automatically. American Association for Artificial Intelligence Spring Symposium on Exploring Attitude and Affect in Text. Stanford. March 2004. AAAI Technical Report SS-04-07. (pp.158-161). Download paper in pdf format. - Download poster (pdf).

Related project:

The Construction of Literary Reputation in Britain: 1900-1950

The objective of this grant is to develop a pilot project to study the evolution of the literary reputations of two authors (John Galsworthy and D. H. Lawrence). Reputation is assessed based on the automatic extraction of key phrases from the authors' work and from writings concerning the authors. The project will create a database of texts, and computational tools to analyze text content automatically.

Funding:

Simon Fraser University's Social Sciences and Humanities Research Council Grant.

PI:

Mary Ann Gillies.

Co-investigators:

Paul McFetridge, Maite Taboada

Publications

Taboada, M., M. A. Gillies, P. McFetridge and R. Outtrim (2008) Tracking literary reputation with text analysis tools. Presented at the Meeting of the Society of Digital Humanities. Vancouver. June 2008. (Poster) Abstract.
Gillies, M. A., P. McFetridge, R. Outtrim and M. Taboada (2008) Finding, scanning, formatting and processing literary reviews. Presented at the Symposium of the Center for Print and Media Studies. SFU. May 2008.
Taboada, M., M. A. Gillies and P. McFetridge (2006) Sentiment Classification Techniques for Tracking Literary Reputation. Proceedings of LREC Workshop, "Towards Computational Models of Literary Analysis". Genoa, Italy. May 2006. pp. 36-43. Paper in pdf format