Table of Contents
Fetching ...

Incomplete Utterance Rewriting with Editing Operation Guidance and Utterance Augmentation

Zhiyu Cao, Peifeng Li, Yaxin Fan, Qiaoming Zhu

TL;DR

This work targets Incomplete Utterance Rewriting (IUR) by addressing two main challenges: irrelevant token generation and limited training data. It introduces EO-IUR, a multi-task framework that uses editing operation labels to guide generation and a token-level heterogeneous dialogue graph to encode syntactic and coreference structure, supplemented by two data augmentation strategies (editing-operation-based incomplete utterance augmentation and LLM-based historical utterance augmentation). The model jointly optimizes editing-label prediction and rewritten-utterance generation, with cross-attention modulated by editing-label probabilities, and benefits from a graph-augmented representation of dialogue. Across Chinese and English datasets (REWRITE, RES200K, TASK), EO-IUR achieves state-of-the-art results, especially in exact-match and completeness metrics, and ablation studies confirm the contributions of the editing guidance, graph structure, and augmentations. The approach demonstrates improved robustness and precision in rewriting, paving the way for more reliable downstream dialogue understanding and task execution.

Abstract

Although existing fashionable generation methods on Incomplete Utterance Rewriting (IUR) can generate coherent utterances, they often result in the inclusion of irrelevant and redundant tokens in rewritten utterances due to their inability to focus on critical tokens in dialogue context. Furthermore, the limited size of the training datasets also contributes to the insufficient training of the IUR model. To address the first issue, we propose a multi-task learning framework EO-IUR (Editing Operation-guided Incomplete Utterance Rewriting) that introduces the editing operation labels generated by sequence labeling module to guide generation model to focus on critical tokens. Furthermore, we introduce a token-level heterogeneous graph to represent dialogues. To address the second issue, we propose a two-dimensional utterance augmentation strategy, namely editing operation-based incomplete utterance augmentation and LLM-based historical utterance augmentation. The experimental results on three datasets demonstrate that our EO-IUR outperforms previous state-of-the-art (SOTA) baselines in both open-domain and task-oriented dialogue. The code will be available at https://github.com/Dewset/EO-IUR.

Incomplete Utterance Rewriting with Editing Operation Guidance and Utterance Augmentation

TL;DR

This work targets Incomplete Utterance Rewriting (IUR) by addressing two main challenges: irrelevant token generation and limited training data. It introduces EO-IUR, a multi-task framework that uses editing operation labels to guide generation and a token-level heterogeneous dialogue graph to encode syntactic and coreference structure, supplemented by two data augmentation strategies (editing-operation-based incomplete utterance augmentation and LLM-based historical utterance augmentation). The model jointly optimizes editing-label prediction and rewritten-utterance generation, with cross-attention modulated by editing-label probabilities, and benefits from a graph-augmented representation of dialogue. Across Chinese and English datasets (REWRITE, RES200K, TASK), EO-IUR achieves state-of-the-art results, especially in exact-match and completeness metrics, and ablation studies confirm the contributions of the editing guidance, graph structure, and augmentations. The approach demonstrates improved robustness and precision in rewriting, paving the way for more reliable downstream dialogue understanding and task execution.

Abstract

Although existing fashionable generation methods on Incomplete Utterance Rewriting (IUR) can generate coherent utterances, they often result in the inclusion of irrelevant and redundant tokens in rewritten utterances due to their inability to focus on critical tokens in dialogue context. Furthermore, the limited size of the training datasets also contributes to the insufficient training of the IUR model. To address the first issue, we propose a multi-task learning framework EO-IUR (Editing Operation-guided Incomplete Utterance Rewriting) that introduces the editing operation labels generated by sequence labeling module to guide generation model to focus on critical tokens. Furthermore, we introduce a token-level heterogeneous graph to represent dialogues. To address the second issue, we propose a two-dimensional utterance augmentation strategy, namely editing operation-based incomplete utterance augmentation and LLM-based historical utterance augmentation. The experimental results on three datasets demonstrate that our EO-IUR outperforms previous state-of-the-art (SOTA) baselines in both open-domain and task-oriented dialogue. The code will be available at https://github.com/Dewset/EO-IUR.

Paper Structure

This paper contains 24 sections, 6 equations, 2 figures, 14 tables.

Figures (2)

  • Figure 1: Overview of our model EO-IUR, which includes utterance augmentation, construction of token-level heterogeneous graph convolutional neural network, editing operation labeling, and editing operation-guided IUR.
  • Figure 2: Human evaluation on REWRITE.