This page is slightly out of date, mostly because some of these topics
have been taken up, and other applications exist.
There are numerous
open questions involving RST. This page is designed to give you a
feel for this diversity of issues, and in particular to help students
who are in the process of selecting thesis or dissertation topics,
and researchers who want to study in the neighborhood of RST. For
each one:
1. I do
not know of any full scale exploration of the subject in the literature,
and
2. I believe that
it could be a genuine contribution to the art to do such an exploration,
and
3. I do not know of
any substantial sized effort that is ongoing and partly completed on
the topic (but there could be some.) In a few cases, there has been
work that needs to be repeated, validated and perhaps extended.
I am not trying to identify thesis/dissertation-sized issues; that
is a matter for the teaching department. In some cases, one question
would take several theses to explore well. In other cases, several
related questions might be gathered to help define one thesis topic.
Also, I am not attempting to accommodate ancient academic boundaries.
We should feel free to integrate knowledge from a range of disciplines.
Certainly psychology, philosophy, linguistics, sociology, computer
science and some other disciplines all bear on the issues below.
Groups of open issues:
What actually happens in texts?
It is easy,
and also risky, to presume about what surely happens in text. Many
of the facts about the text that exhibits particular RST structure
are unknown or nearly so. Some of our "general" knowledge
has been derived only from texts in a single language, only from a
small sample of texts, and only from unsystematic sampling. For many
of these topics it is not necessary to do a large sample study in order
to make progress; suggestive empirical results are easy to create,
and are worthwhile for guiding studies of the very few things that
need large-sample studies.
This sort of investigation is appropriate in this topic area because
the basic phenomena are only partly identified.
• What are the different
patterns actually found in how relations are realized in text?
What are their frequencies?
• What are the conditions under which each relation is signalled
or not?
• Under what conditions are Relational Propositions (RPs) defeasible
or not? [Relational Propositions are implicitly communicated by discourse
structure. For a short summary of the meaning and status of Relational
Propositions, see an archived message by Bill Mann to the RSTlist
on February 29, 2000. Click here to go the archive page: archives ]
• How does recognition of the relations of RST interact with the
cohesive devices of various languages?
• What sorts of RPs arise from texts?
• RPs have been explored a bit for nucleated relations, but much
less for multinuclear ones. What RPs arise from multinuclear relations?
There are issues of what sorts of differences between languages,
writing
traditions, social classes, educational backgrounds and expressive
training would require adjustments in the RST required to analyze
the text that various people produce. The notion of coherence seems
quite
widely shared, but how do the details differ across society, or
among the people who act as RST observers? For the latter:
• How consistent
are individual observers doing RST analysis? To what degree do
very similar observers tend to agree? How are these related to observer
training or experience?
• Does RST work on texts that are focused on reasoning or argumentation,
e.g. from mathematics, astronomy or law?
• Comparing oral and written monologues, do the RST patterns or
signaling patterns differ?
• What patterns are there in the ordering of spans in RST analyses?
• How have the set of RST relations and their realization patterns
varied along with centuries of language change?
Understanding RST
analysis
There are only a few studies of how observers function when
they analyze text. How does subjectivity appear in the
resulting analyses?
Prevailing opinion, especially following studies by Marcu, suggests
that rigid left-to-right analysis methods fail. The same is said of
top down, right to left and bottom up analysis methods. Effective analysis
seems to be best when it is opportunistic, not algorithmically guided.
Yet we presume that reading proceeds left to right in some sense, and
that the observer's access to text is very much like the reader's.
This paradox could be resolved if it were demonstrated that readers
move around in texts in a way that resembles good opportunistic analysis.
It is a speculation, perhaps easily dismissed using the technical literature
on reading.
RST and Other Linguistic Domains
The dominant viewpoint
in the definitions of RST is that it is a way to gather detailed information
about text in a systematic way. This contrasts, for example with RST
being an abstract model of some part of the text understanding process.
As an orientation and preparation for building theories or models,
RST can contribute in interesting ways but it cannot be validated except
as a small fragment of a larger whole. It requires compatible "modules" or
subtheories or submodels to interact with. Issues arise about how RST
and others interact.
• What on the others does RST create, and what
constraints or demands does it place on the interacting models? (For
example, several RST relations including Evidence are formulated in
terms of degrees of belief. This may create a requirement that a semantic
model be able to somehow represent degrees of belief.)
Asking about
these others, as a research planning exercise, involves identifying
other parts of linguistics that one is willing to (temporarily) take
for granted as being "good enough for our current purposes." Each
such step is therefore controversial, because even the most widely
acknowledged stable points in linguistics are being questioned.
Here we can call the Neighboring Linguistic Model NLM, and we can ask
some underspecified questions:
• What information does RST require
for which NLM is the obvious source?
• What can RST supply to NLM that will help NLM function effectively?
• Are the frameworks of assumptions of RST and NLM entirely compatible?
Among
the wide range of possible NLMs, these generic areas (as examples of
distinct NLMs) seem to be particularly open to fruitful interaction:
• Semantics.
• Speech act theories.
• RST and relational propositions in interaction with proposition-like
entities of syntax: dependent clauses, adjective modifiers, etc.
• The reader's knowledge of the writer, based on the text, its
context of appearance and shared cultural expectations about
the writer.
So,
taking semantics as an exemplar, we can ask:
• Can methods
that explain the recognition of unsignalled relations be reconciled
with traditional
semantic methods?
• Can methods that explain the finding of Relational Propositions
be reconciled with traditional semantic methods.
Beyond Relational Structure of Written Monologues
• What aspects of dialogue or multiparty interaction are not represented
by RST? For these, how can RST be extended and modified to allow
representation of the coherence and intentional structure of entire
interactions?
• Current representations of so-called Holistic Structure in RST lack
detail. How do the two varieties of structure combine to perform larger-scale
text functions?
How RST Discourse Structure Arises and How it is Recognized
We take RST as it is presently defined to be a method for structured
description
rather than a model of how language is used. RST allows us to observe
with greater precision and certainty than we get from simply reading
text. Then two very large questions arise:
• How do these regularities
that we see using RST reflect how text is read (or how language
is understood)?
• How do these regularities that we see using RST reflect how text
is created (or how language is produced)?
Answers to these questions
would be models of language use, including both information structures
(or memories) and processes, embedded in some broader view of the
language user.
Progress in this area will be incremental and fragmentary for
the foreseeable future. It is certainly open for indefinitely
many theses.
Revising RST Analysis: Scope
There are numerous ideas about what RST
-- for no obvious reason -- fails to represent. There is a possibility
that by increasing the conceptual scope of RST, much more informative
text descriptions could be created by observers.
First, notice that RST does not contain any unary operators. It is
dominated at the relational level by binary (two-argument) operations,
and at the schema level (see M&T 88) by n-ary operations, along
with restrictions on mixing relations to particular nuclei. One cluster
of ideas suggests adding unary operators in the analysis result forms.
These operators could include: quotation, hedging, doubt, focus, indirectness
of speech acts, distancing of the writer from certain kinds of responsibilities,
and other possibilities. Unary operators (UOs) could be added as a
group or one by one. Each would raise the issue of how that UO can
be defined independent of form, and whether it carries any compatibility
constraints relative to other structures.
Another cluster of possibilities concerns analysis form. RST analyses,
like the diagrams in various styles of grammar, tend strongly toward
single tree structures for texts. Other comparable varieties of analysis,
such as Systemic Functional Grammar, use multiple coextensive structures,
each of which covers the entirety. One result is that the individual
structures can be much simpler, since the problem of coordinating several
varieties of knowledge has been dismissed. It has been suggested that
RST be reformulated to show Systemic-like varieties of structure, which
(done in an orthodox way) might be called Ideational, Interpersonal
and Textual. Among other things, this might make it possible to find
a Textual component of structure that would relate directly to ideas
of presentational style and genre. It is worth much more exploration
than it has been given. Revising RST Analysis: Process RST takes a
distinctive approach to subjectivity in analysis, both restricting
it and making its role explicit. The consequences of doing so have
never been systematically explored. Also, the possibility of using
this approach to subjectivity outside of RST has not been carefully
considered.
Conceptual Foundations Non-Binary Concepts: RST is formulated
using various non-binary concepts, notions that can be fulfilled to
varying
degrees. These include, in the reader, belief, desire to act, degree
of difference (e.g. in Contrast), conceptual compatibility (e.g. in
Concession), social right to express (e.g. in Justify), and adequacy
of the reader's prior knowledge (e.g. in Background.). They also include
(in the observer) plausibility (both singular and comparative plausibility.)
These definitional choices give RST broad representational power in
analysis, but they also severely restrict how RST ideas can be reconciled
with more traditional binary ideas in closely related disciplines,
such as semantics based on logic and set theory.
These choices should be reexamined carefully. It would also be worthwhile
to reconsider the whole notion of "Relational Propositions," for
which the most available interpretation is binary.
Intention: RST makes explicit use of the notion of the writer's intention,
and attributes an intention (the "effect") to every use of
every relation. This is broad but very shallow. Preliminary work on
developing RST used the concept of intention much more extensively.
Of course, it is extremely controversial in some academic circles to
do so. (See, for example, Intentions in the Experience of Meaning ,
Raymond W. Gibbs Jr., Cambridge University Press, 1999, p. 3-18.) RST
places few restrictions on what intentions may arise or combine. (For
example, one might suspect that the intentions associated with low
subtrees might be restricted to those that somehow serve the higher
intentions of the text. But there are notational and formulation issues,
and beyond those it is an unstudied empirical issue whether texts are
in fact consistent in that way.) Issues of how RST can be elaborated,
and also made more definite and informative in analysis of writer's
intention, need to be studied at length.
Scientific Status: Very little work has been done on the scientific
status of RST, and of that work almost nothing appears in the literature.
Especially because of the distinctive treatment of subjectivity and
the issues of the contrast of descriptions of phenomena with explanations
of those phenomena, more work would be worthwhile.
Communication: In 20th century physics, there emerged a "standard
model" of physical interactions, a consensus against which new
ideas were measured. Linguistics has no consensus and no "standard
model of symbolic communication" against which ideas about communication
might be measured. The convenient and most widely used model, the so
called "code model," is also widely rejected. RST also says
very little about communication, partly for lack of an accepted alternative
to the code model. Ideas and empirical studies that would relate RST
to definite notions of communication could become very valuable.
Applications of RST A small number of people have used RST as a writing guide . All such
use has been entirely informal, but there is the possibility of developing
a practical aid to writing based on RST. Closely related, we could
think of RST as a potential basis for conceptual training of students
or teachers of writing.
There are a number of computational systems that apply or are inspired
by RST. The applications include:
• text generation,
• automatic summarization,
• text indexing,
• evaluation of students' compositions
• (there are more.)
Many of these show promise, but some are not ready
for general use. Extension or reformulation of existing approaches
appears worthwhile.
Beyond these, there are ideas that seem approachable but which have
no literature. Here are examples:
• Discourse-aware Controlled
languages for translation: There is presently a collection of proprietary
and
public technologies for automatic translation of documents into
multiple languages, with the difficulties of translation strongly
limited
by controlled vocabularies and imposing strong limitations on
how words
and syntax are used. To my knowledge, the restrictions in these
methods are confined to the sentence level. Based on RST, the methods
could
be extended to whole texts, to unambiguous meaning and scope
• Preanalyzed text: Assume that text,
written by humans, can be entered into computers attached to underlying
RST diagrams
or their
equivalent,
representing the writers' low level intentions. Then several
kinds of support for the writers can be considered:
Regularization
of
text organization.
Exhibiting implicit RPs.
Selective prompting for reader support (background, summaries,
restatements, ...).
Automatic topical outlining.
Regularization of tense, irrealis and other features across
multinuclear sets.
Assisted reordering.
As computational research it might be
worth trying.