This page provides additional
material for Debopam Das' dissertation, "Signalling of
Coherence Relations in Discourse", completed August 2014
in the Department of Linguistics at Simon Fraser
University.
The thesis is available via the SFU
Library: http://summit.sfu.ca/item/14446
The corpus compiled as part of the thesis
is available through the Linguistic Data Consortium.
From this page you can download statistics
on the above corpus, as a zip file: Das_PhD_Supplementary_Material.zip
Details on the zip file:
- The material contains two types of
distributions in the RST Signalling Corpus:
- The statistical distribution of
coherence (rhetorical) relations with respect to the
signals used to indicate those relations, and
- The statistical distribution of
signals with respect to the relations indicated by
those signals.
- The RST Signalling Corpus, which
is built over the RST Discourse Treebank, includes a
collection of over 20,000 coherence relations
annotated for signalling information. The corpus
provides signalling-wise annotation for 78 types of
relations present in the RST Discourse Treebank.
- The signals used for annotation
are divided into three broad classes: Single Signal,
Combined Signal and Unsure.
- The class Single Signal is
divided into nine signal types: discourse marker
(DM), reference, lexical, semantic,
morphological, syntactic, graphical, genre and
numerical features.
- The class Combined Signal is
divided into six signal types: (reference +
syntactic), (semantic + syntactic), (lexical +
syntactic), (syntactic + semantic), (syntactic +
positional) and (graphical + syntactic).
- These signal types are
further divided into numerous specific signals.
- More information about the RST
Discourse Treebank and the RST Signalling Corpus can
be found in the dissertation.
A description of the sub-directories
and data follows:
- All files are in HTML format.
- The directory named
"Relation_Distribution_by_Signals" contains the
statistical distribution of individual relations
with respect to both signal types and specific
signals used to indicate those relations.
- The directory named
"Signal_Distribution_by_Relations" contains the
statistical distribution of signals (including
signal classes, signal types and specific signals)
with respect to individual relations.
- Terms used in file names as
abbreviations:
1. graph_plus_syn = graphical
+ syntactic
2. lex_plus_syn = lexical + syntactic
3. ref_plus_syn = reference + syntactic
4. sem_plus_syn = semantic + syntactic
5. syn_plus_pos = syntactic + positional
6. Syn_plus_sem = syntactic + semantic
7. comma_plus_pres_part_cl = comma + present
participial clause
8. comma_plus_pst_part_cl = comma + past
participial clause
9. ind_wrd_plus_pres_part_cl = indicative word +
present participial clause
10. com_ref_plus_subj_NP = comparative reference
+ subject NP
11. dem_ref_plus_subj_NP = demonstrative
reference + subject NP
12. per_ref_plus_subj_NP = personal reference +
subject NP
13. prop_ref_plus_subj_NP = propositional
reference + subject NP
14. gen_word_plus_subj_NP = general word +
subject NP
15. lex_chain_plus_subj_NP = lexical chain +
subject NP
16. mero_plus_subj_NP = meronymy + subject NP
17. rep_plus_subj_NP = repetition + subject NP
18. syno_plus_subj_NP = synonymy + subject NP
19. pres_part_cl_plus_beg = present participial
clause + beginning
20. pst_part_cl_plus_beg = past participial
clause + beginning
21. par_constr_plus_lex_chain = parallel
syntactic construction + lexical chain
©2014 Debopam Das