parallel in Catalan, Spanish and English
- description of the lexicon
- markers signalling revision
- markers signalling cause
- markers signalling equality
- markers signalling context
- highly ambiguous markers
- vague markers
description of the lexicon
This is the seminal discourse marker lexicon used in the thesis Representing discourse for automatic text summarization via shallow NLP techniques. The discourse markers listed here were the primary source of evidence to draw the semantic maps to obtain an inventory of basic discursive meanings. This lexicon is also the basis for the implementations of a discourse segmenter and for the discourse analysis exploited by the e-mail summarizer Carpanta.
The lexicon is parallel in three languages: Catalan, Spanish and English. Therefore, in this starting version of the lexicon we have only included those discourse markers that have a near-synonym in one of the other languages. Those that do not have a near-synonym have been included in the extended version of the lexicon created by bootstrapping techniques applied to this starting lexicon.
The discourse markers that constitute the prototypical lexicon were obtained from previous work, mostly Knott (1996) and Marcu (1997), with the restriction that they are highly grammaticalized. We have also included in the lexicon some closed class words, obtained from the dictionary of the FreeLing morphosyntactic analyzer. We have discarded closed class words that are very vague and highly ambiguous discourse markers.
In this lexicon, discourse markers are characterized by their structural (continuation or elaboration) and semantic (revision, cause, equality, context) meanings, and they are also associated to a morphosyntactic class (part of speech, PoS), one of adverbial (A), phrasal (P) or conjunctive (C).
No information has been encoded about the reliability of discourse markers with respect to their discursive (vs. sentential) function. The only information of this kind that we provide is that discourse markers that are highly ambiguous with respect to their function are not included in the lexicon.
Sometimes a discourse marker is underspecified with respect to a meaning. We encode this with a hash. This tends to happen with structural meanings, because these meanings can well be established by discursive mechanisms other than discourse markers, and the presence of the discourse marker just reinforces the relation, whichever it may be.
Sometimes a discourse marker is ambiguous with respect to two meanings. In this cases, we write the predominant meaning in italics, and the secondary meaning in parentheses, or both of them in italics if no predominant meaning can be determined. Resolving such ambiguities normally requires information about the context of occurrence, but we have not associated discourse markers with the contextual features that can be of aid to disambiguate them. Nevertheless, it seems that determining the adequate meaning associated to a particular instance of a discourse marker can be well addressed by general procedures, directly implemented in those algorithms that exploit the information stored in a lexicon (segmentation algorithms, discourse parsers, etc.).
All in all, the lexicon is formed by 84 discourse markers, representing different discursive meanings. Some discourse markers have been assigned to more or less than one meaning per dimension, because they are ambiguous or underspecified, respectively.
revision | cause | equality | context | total | |
elaboration | 4 | 9 | 10 | 22 | 41 |
continuation | 9 | 9 | 6 | 4 | 28 |
underspecified | 1 | -- | 10 | 4 | 15 |
total | 14 | 18 | 26 | 32 | 84 |
- markers signalling revision
- markers signalling cause
- markers signalling equality
- markers signalling context
- highly ambiguous markers
- vague markers
Catalan | English | Spanish | structural | semantic | PoS |
a pesar de | despite | a pesar de | elaboration | revision | P |
encara que | although | aunque | elaboration | revision | P |
excepte | except | excepto | elaboration | revision | P |
malgrat | in spite of | pese a | elaboration | revision | P |
no obstant | however | no obstante | continuation | revision | A |
nogensmenys | nevertheless | sin embargo | continuation | revision | A |
en realitat | actually | en realidad | continuation | revision | A |
de fet | in fact | de hecho | continuation | revision | A |
al contrari | on the contrary | al contrario | continuation | revision | A |
el fet és que | the fact is | el hecho es que | continuation | revision | P |
és cert que | it is true that | es cierto que | continuation | revision | P |
però | but | pero | continuation | revision | P |
tot i això | even though | con todo | continuation | revision | A |
ara bé | well now | ahora bien | continuation | revision | A |
de tota manera | anyway | de todos modos | -- | revision | A |
- however differs from although in their values for continuation or elaboration, although each of them can be used to rephrase the other in some contexts, however is attached to the segment that indicates continuation, although is attached to the segment that indicates elaboration.
-
actually / in fact their primary meaning is
marking evidentiality [ex], but they tend to be
structurally equivalent to however, as we have
shown using multiple alignment techniques (Alonso et
al. 2004). In English their evidentiality meaning is
more predominant than the revision meaning, and so
their contribution as discourse markers of revision is
only reliable when it co-occurs with other discourse
markers also signalling revision [ex] or in certain punctuation
contexts [ex],
although they can also signal revision without any of
these further evidence [ex]. In Spanish and
Catalan their primary meaning is revision ( [ex] and [ex]), respectively) comparable to
it is true that. The kind of revision that these
discourse markers tend to convey in Spanish and Catalan
is correction (a prototypical example of
correction would be: ``This is not black, but
white.''). We can speculate that the reason why the
revision meaning of these discourse markers is more
primary in Spanish or Catalan than in English is
because in these languages the correction meaning tends
to be expressed by discourse markers, as can be seen in
the fact that it is lexicalized (sinó, sino),
while in English it is covered by the all-purpose
revision discourse marker but, and correction is
only distinguished from other kinds of revision by
other linguistic features.
example:
They could also help themselves by thinking through a problem before phoning the support desk.
In many cases a user will actually solve his or her own problem while on the phone to Neptune!example:
a Standardisation has never been the IT industry's strong point, and the answer is "probably not". However, they don't actually all do the same job.
b He then argues that "it is not sufficient (for me) to tell the conference that there will be no return to mass picketing". Actually, I never mentioned picketing, mass picketing or otherwise, in my speech, but let that pass.example:
a Amnesty warmly welcomed the release of prisoners of conscience and the repeal of certain articles, but has urged that the legislation be extended to include reform or repeal of further articles of the Turkish Penal Code, under which POCs may be held. The new law may in fact increase the already serious risk of torture facing political detainees.
b La idea inicial de Maragall fue celebrar una exposición internacional, pero ese propósito falló cuando alguien de su gabinete descubrió que habían llegado tarde para obtener el reconocimiento internacional para un acontecimiento de este tipo. En realidad poco importaba qué se hiciera. Tanto Clos como Maragall perseguían en esencia poner una nueva fecha al futuro de la ciudad.
c Tot va començar, com en les novel.les policíaques, amb un fiscal, entestat a treure a la llum el taló d'Aquil.les del president demòcrata. L'ham: una becària de 22 anys, grassoneta --usa la talla 46--, de pits exuberants i boca àmplia, una mica esbojarrada ja que creia tenir una relació sentimental quan en realitat va mantenir 10 trobades sexuals servides a domicili amb el senyor Clinton, qui, durant set mesos, es va obstinar a negar haver mantingut contacte físic amb ella. - it is true that in contrast with actually or in fact, its primary meaning is revision, like en realidad, en realitat, de fet, de hecho in Spanish and Catalan.
Catalan | English | Spanish | structural | semantic | PoS |
donat que | given that | dado que | elaboration | cause | P |
perquè | because | porque | elaboration | cause | P |
degut a | due to | debido a | elaboration | cause | P |
gràcies a | thanks to | gracias a | elaboration | cause | P |
per si | in case | por si | elaboration | cause | P |
per | because of | por | elaboration | cause | P |
per això | that's why | por eso | continuation | cause | A |
en conclusió | in conclusion | en conclusión | continuation | cause | A |
així que | thus | así que | continuation | cause | P/A/P |
com a conseqüència | as a consequence | como consecuencia | continuation | cause | A |
per | in order to | para | continuation | cause | P |
perquè | so that | para que | continuation | cause | P |
per aquesta raó | for this reason | por esta razón | continuation | cause | A |
per tant | so | por tanto | continuation | cause | A/C/A |
en efecte | in effect | en efecto | continuation | cause | A |
-
in conclusion while it looks similar to in
sum, this discourse marker tends to convey new
information, not to rephrase it. Compare the following
example with the example for in sum. With
respect to the effects on coherence and relevance, it
is comparable to consecutive discourse markers like
that's why or so then, which can also
signal relations that are not motivated by a causal
relation in the real world, but have the same
rhetorical strength as those that are motivated by a
real causal relation. It is comparable to in
effect.
example
The European Court further ruled in this case that Arts 48 and 59 of the EC Treaty do not prevent a member state from requiring that the exercise of the profession of auditor in that state by a person qualified to carry on that profession in another member state be conditions which are objectively necessary to guarantee observation of professional rules concerning the permanence of the infrastructure in place for the completion of the work, the effective presence in the member state and assurance of the observation of professional ethics, unless respect for such rules and conditions is already guaranteed by a reviseur d'entreprises, whether a natural person or a firm, established and recognised in the state, and in whose service is placed, for the duration of the work, the person who intends to exercise the profession of auditor. In conclusion, one has to wonder whether the borders are in fact open. -
perquè / per in Catalan these discourse markers
are underspecified with respect to structural meaning,
they can be equivalent to so that / to [ex] or to because / because
of [ex].
example
a Avui sento por perquè han declarat impunes tots els caps d'Estat. Today I feel frightened because all heads of State have been declared impune.
b La Generalitat ha fet una crida a la solidaritat perquè s'ocupin aquestes cases. The Generalitat has made a call to solidarity so that these houses are occupied.
Catalan | English | Spanish | structural | semantic | PoS |
en resum | in sum | en resumen | elaboration | equality | A |
concretament | specifically | concretamente | elaboration | equality | A |
en essència | essentially | en esencia | elaboration | equality | A |
en comparació | in comparison | en comparación | elaboration | equality | A |
en altres paraules | in other words | en otras palabras | elaboration | equality | A |
en particular | in particular | en particular | elaboration | equality | A |
és a dir | that is to say | es decir | elaboration | equality | C |
per exemple | for example | por ejemplo | elaboration | equality | A |
precisament | precisely | precisamente | elaboration | equality | A |
tal com | such as | tal como | elaboration | equality | P |
en darrer lloc | lastly | por último | continuation | equality | A |
per una banda | on the one hand | por un lado | continuation | equality | A |
per altra banda | on the other hand | por otro lado | continuation | equality | A |
a propòsit | by the way | a propósito | continuation | equality | A |
no només | not only | no sólo | continuation | equality | P |
sinó també | but also | sino también | continuation | equality | P |
en dues paraules | in short | en dos palabras | -- | equality | A |
a més | moreover | además | -- | equality | A |
també | also | también | -- | equality | A |
a banda | besides | aparte | -- | equality | A |
encara més | what's more | aún es más | -- | equality | A |
fins i tot | incluso | even | -- | equality | P |
especialment | specially | especialmente | -- | equality | A |
sobretot | above all | sobretodo | -- | equality | A |
- not only ... but also
- lastly unlike first of all or to begin with, and like secondly, thirdly, this discourse marker is not ambiguous, because it requires a context of sequence to be felicitous.
-
on the one hand / on the other hand like
lastly, they require a sequence context to be
felicitous, so they are not ambiguous with respect to
their structural or semantic meaning, but their
ambiguity with respect to scope varies greatly. If they
co-occur [ex], their scope can
be determined if we consider that the scope of on
the one hand reaches until the point of occurrence
of on the other hand, and that the latter has a
scope of an equivalent size. However, if on the
other hand occurs alone, its scope is very hard to
determine automatically, and probably also by human
judges.
example
It does occur to Fukuyama that religion might have some sort of unease to express with all this, but he appears to conceive of religion under only two modes. On the one hand, there is fundamentalist counter-ideology, the Islamic theocratic state. This, it is to be assumed, his liberal readers may take seriously as a threat, but hardly as an option. And on the other hand, there are "less organised religious impulses", religion as individual preference. This he knows can readily be accommodated another sort of consumer commodity, "within the sphere of personal life permitted in liberal societies". -
in short is ambiguous with respect to
continuation or elaboration, because the discourse unit
to which the discoruse marker is attached can sometimes
contribute new information, as in the following
example.
example
The authors maintain that the role of women in the Tigrayan society is still closely linked to their status in the feudal system. 1975 women were treated as children. They were not allowed to own land nor speak. In short women were at the bottom of the hierarchy of oppression with no rights of any kind. - in sum / essentially convey an elaborative relation because they repeats information that has already been given, even if this information is given in a shorter form. The utilitty of these discourse markers for automatic summarization is an ad-hoc property, subject to the task and not to their effects with respect to coherence and relevance assessment. Therefore, it has to be treated by manually creating a special rule that overrides general discursive rules.
Catalan | English | Spanish | structural | semantic | PoS |
considerant | considering | teniendo en cuenta | elaboration | context | P |
després | after | después | elaboration | context | P |
abans | before | antes | elaboration | context | A |
originalment | originally | originalmente | elaboration | context | A |
a condició de | provided that | a condición de | elaboration | context | P |
durant | during | durante | elaboration | context | P |
mentre | while | mientras | elaboration | context | P |
a no ser que | unless | a no ser que | elaboration | context | P |
quan | when | cuando | elaboration | context | P |
on | where | donde | elaboration | context | P |
d'acord amb | in accordance with | de acuerdo con | elaboration | context | P |
lluny de | far from | lejos de | elaboration | context | P |
tan aviat com | as soon as | tan pronto como | elaboration | context | P |
de moment | for the moment | por el momento | elaboration | context | A |
entre | between | entre | elaboration | context | P |
cap a | towards | hacia | elaboration | context | P |
fins a | until | hasta | elaboration | context | P |
mitjançant | by means of | mediante | elaboration | context | P |
segons | following | según | elaboration | context | P |
en qualsevol cas | in any case | en cualquier caso | continuation | context | A |
aleshores | then | entonces | continuation | context | A |
respecte de | with respect to | respecto a | continuation | context | P |
en aquest cas | in that case | en ese caso | continuation | context | A |
si | if | si | -- | context | P |
sempre que | whenever | siempre que | -- | context | P |
sens dubte | no doubt | sin duda | -- | context | A |
alhora | at the same time | a la vez | -- | context | A |
-
first of all / to begin with as many discourse
markers, these are lexically underspecified with
respect to elaboration or continuation, they can
reinforce progressive and elaborative relations that
are actually signalled by means other than this
discourse marker. They are also ambiguous between
context and equality. If it is part of a sequence, as
in example [ex],
it will signal equality, if not, as in example [ex], it will signal
context. By default, we ascribe it to context, and only
if there is enough evidence is it ascribed to equality.
example
a Police say that cars are being stolen to be resold in car-starved east European countries. To begin with, thieves went for the likes of Golf GTis and BMWs, but now bread-and-butter cars are also being taken.
b In Four Saints Thomson's informality was given free reign since he first of all improvised the music at the piano then, when it stuck, wrote it down to a figured bass.example:
a But there never was a threat to a new German-American special relationship, since there never was such a special relationship to begin with.
b "We are a bit of a way from that. But I certainly believe, first of all, we have to give what help we can," said Mr Hurd. -
in any case indicates continuation and context.
It seems to have effects comparable to revision, but it
is hard to find what is denied. It seems that it has
contrastive functions, which can be best attributed to
the properties of continuation than to any possible
revision. It is comparable to topic-based but,
but in that case there seems to be more correlation
with items signalling negative polarity, which seems to
support an interpretation as revision. In this respect,
it is different from anyway, which always
conveys revision.
example
In truth, however, humble photocopying has been overtaken by the wonders of the fax and personal computers complete with printers. Whatever the price of these latter (20 times their cost in the West) and reinforced customs procedures for their import, they are finding their way in. The controls in any case are surely doomed to fail. - then is characterized by the two least marked meanings in each dimension, which makes it very close to narration.
- unless even if it has inherent negative polarity, it does not convey revision, but context, comparable to if or in case.
Catalan | Spanish | English | structural | semantic |
com | como | like | elaboration | equality, context, cause |
com | como | since | elaboration | cause, context |
desde | desde | since | elaboration | cause, context |
sobre | sobre | about | continuation, elaboration | context |
sobre | sobre | over | continuation, elaboration | context |
abans de res | first of all | antes que nada | -- | context (or equality) |
per començar | to begin with | para empezar | -- | context (or equality) |
Catalan | Spanish | English |
i | y/e | and |
ni | ni | nor/neither |
o | o/u | or |
que | que | that |
amb | con | with |
sense | sin | without |
contra | contra | against |
en | en | in |
a | at/to | a |
to |