Table of Contents
Fetching ...

CoUDA: Coherence Evaluation via Unified Data Augmentation

Dawei Zhu, Wenhao Wu, Yifan Song, Fangwei Zhu, Ziqiang Cao, Sujian Li

TL;DR

CoUDA tackles coherence evaluation by addressing both global discourse organization and local sentence transitions under data scarcity. It introduces a unified data augmentation framework consisting of global shuffling and a novel generative local augmentor with context truncation and coherence filtering, plus a unified scoring mechanism that combines global and local cues. With a compact model of 233M parameters, CoUDA achieves state-of-the-art correlations on SummEval and superior pairwise ranking on INSteD, often outperforming GPT-4-based metrics in the pointwise setting. The approach offers a practical, linguistically informed, and efficient solution for robust discourse coherence assessment in summarization and related tasks.

Abstract

Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models. However, previous augmentations for this task primarily rely on heuristic rules, lacking designing criteria as guidance. In this paper, we take inspiration from linguistic theory of discourse structure, and propose a data augmentation framework named CoUDA. CoUDA breaks down discourse coherence into global and local aspects, and designs augmentation strategies for both aspects, respectively. Especially for local coherence, we propose a novel generative strategy for constructing augmentation samples, which involves post-pretraining a generative model and applying two controlling mechanisms to control the difficulty of generated samples. During inference, CoUDA also jointly evaluates both global and local aspects to comprehensively assess the overall coherence of a discourse. Extensive experiments in coherence evaluation show that, with only 233M parameters, CoUDA achieves state-of-the-art performance in both pointwise scoring and pairwise ranking tasks, even surpassing recent GPT-3.5 and GPT-4 based metrics.

CoUDA: Coherence Evaluation via Unified Data Augmentation

TL;DR

CoUDA tackles coherence evaluation by addressing both global discourse organization and local sentence transitions under data scarcity. It introduces a unified data augmentation framework consisting of global shuffling and a novel generative local augmentor with context truncation and coherence filtering, plus a unified scoring mechanism that combines global and local cues. With a compact model of 233M parameters, CoUDA achieves state-of-the-art correlations on SummEval and superior pairwise ranking on INSteD, often outperforming GPT-4-based metrics in the pointwise setting. The approach offers a practical, linguistically informed, and efficient solution for robust discourse coherence assessment in summarization and related tasks.

Abstract

Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models. However, previous augmentations for this task primarily rely on heuristic rules, lacking designing criteria as guidance. In this paper, we take inspiration from linguistic theory of discourse structure, and propose a data augmentation framework named CoUDA. CoUDA breaks down discourse coherence into global and local aspects, and designs augmentation strategies for both aspects, respectively. Especially for local coherence, we propose a novel generative strategy for constructing augmentation samples, which involves post-pretraining a generative model and applying two controlling mechanisms to control the difficulty of generated samples. During inference, CoUDA also jointly evaluates both global and local aspects to comprehensively assess the overall coherence of a discourse. Extensive experiments in coherence evaluation show that, with only 233M parameters, CoUDA achieves state-of-the-art performance in both pointwise scoring and pairwise ranking tasks, even surpassing recent GPT-3.5 and GPT-4 based metrics.
Paper Structure (43 sections, 6 equations, 6 figures, 6 tables)

This paper contains 43 sections, 6 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Example for global coherence and local coherence in a discourse. Globally, the discourse is well-structured, with a opening sentence to introduce the argument, five sentences to give evidence from two aspects, and a closing sentence for conclusion. Locally, the focused items, which is denoted in Red and Purple, transfers smoothly from sentence to sentence.
  • Figure 2: Overview of our proposed CoUDA framework. (a): First, we use global and local augmentation to create negative samples $\mathcal{D}_g^-$ and $\mathcal{D}_l^-$, respectively. (b): Then, we combine $\mathcal{D}_g^-$ and $\mathcal{D}_l^-$ with the original discourses $\mathcal{D}$ to train our metric model via coherence/incoherence classification. (c): In inference phase, our metric model scores the whole discourse for global score $S_g$, and scores each consecutive sentence pairs for local score $S_l$. $S_g$ and $S_l$ are combined to produce the final coherence score.
  • Figure 3: Ablation study of global and local scores in our unified scoring strategy.
  • Figure 4: Average of dataset-level Spearman / Pearson / Kendall correlation on SummEval w.r.t. discourses containing different numbers of sentences.
  • Figure 5: Skewed Template for G-Eval-3.5 in pairwise ranking. We adopt the Balanced Position Calibration strategy proposed by wangLargeLanguageModels2023 to alleviate positional bias of LLMs
  • ...and 1 more figures