Table of Contents
Fetching ...

Llamipa: An Incremental Discourse Parser

Kate Thompson, Akshay Chaturvedi, Julie Hunter, Nicholas Asher

TL;DR

This work addresses incremental SDRT-style discourse parsing by fine-tuning Llama-based LLMs to jointly predict attachment links and relation types using already inferred discourse structure. The Llamipa model leverages a large context window ($k=15$) and QLoRA-fine-tuning to produce incremental predictions, achieving state-of-the-art results on MSDC, STAC-L, and out-of-domain Molweni in relation labeling. Through comprehensive ablations, the authors demonstrate that explicit discourse structure in the input, as well as a broad contextual window, are crucial for robust long-distance relation predictions, especially for Narration and Correction. The approach holds promise for downstream conversational systems and tasks requiring structured discourse representations, while acknowledging segmentation requirements and cross-domain generalizability limitations.

Abstract

This paper provides the first discourse parsing experiments with a large language model(LLM) finetuned on corpora annotated in the style of SDRT (Segmented Discourse Representation Theory Asher, 1993; Asher and Lascarides, 2003). The result is a discourse parser, Llamipa (Llama Incremental Parser), that leverages discourse context, leading to substantial performance gains over approaches that use encoder-only models to provide local, context-sensitive representations of discourse units. Furthermore, it can process discourse data incrementally, which is essential for the eventual use of discourse information in downstream tasks.

Llamipa: An Incremental Discourse Parser

TL;DR

This work addresses incremental SDRT-style discourse parsing by fine-tuning Llama-based LLMs to jointly predict attachment links and relation types using already inferred discourse structure. The Llamipa model leverages a large context window () and QLoRA-fine-tuning to produce incremental predictions, achieving state-of-the-art results on MSDC, STAC-L, and out-of-domain Molweni in relation labeling. Through comprehensive ablations, the authors demonstrate that explicit discourse structure in the input, as well as a broad contextual window, are crucial for robust long-distance relation predictions, especially for Narration and Correction. The approach holds promise for downstream conversational systems and tasks requiring structured discourse representations, while acknowledging segmentation requirements and cross-domain generalizability limitations.

Abstract

This paper provides the first discourse parsing experiments with a large language model(LLM) finetuned on corpora annotated in the style of SDRT (Segmented Discourse Representation Theory Asher, 1993; Asher and Lascarides, 2003). The result is a discourse parser, Llamipa (Llama Incremental Parser), that leverages discourse context, leading to substantial performance gains over approaches that use encoder-only models to provide local, context-sensitive representations of discourse units. Furthermore, it can process discourse data incrementally, which is essential for the eventual use of discourse information in downstream tasks.
Paper Structure (10 sections, 4 figures, 6 tables)

This paper contains 10 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: MSDC dialogue example showing CDUs composed of EEUs for builder action sequences and multiparent discourse units (MPDUs). The Narration on the right links "high level" instructions, which are particularly hard to predict, as explained in Section \ref{['sec:ablation']}.
  • Figure 2: Depiction of dialogue increments seen during Llamipa3+p generation. During generation, the predicted relations are added to the context structure for the following increment.
  • Figure 3: Depiction of targeted perturbation ablations described in Section \ref{['sec:ablation']}. Left: We remove the Clarification-question from the structure of samples that should predict Question-answer pair relations, and see a 50 point drop in F1 for Question-answer pair prediction. Right: We change the first Correction in the structure to Acknowledgement in samples that should predict a second Correction in response to the first.
  • Figure 4: Correction triangles are composed of a Correction to an action sequence, followed by a new action sequence that is the Result of the corrective move and a Correction of the first action sequence.