Open-Domain Text Evaluation via Contrastive Distribution Methods

Sidi Lu; Hongyi Liu; Asli Celikyilmaz; Tianlu Wang; Nanyun Peng

Open-Domain Text Evaluation via Contrastive Distribution Methods

Sidi Lu, Hongyi Liu, Asli Celikyilmaz, Tianlu Wang, Nanyun Peng

TL;DR

This work introduces Contrastive Distribution Methods (CDM) for open-domain text evaluation, framing model quality as an oracle function $E(p)$ and leveraging a partial-order across model sizes to contrast distributions from two models. It develops two evaluation paradigms: Generative CDM, which synthesizes challenging negative samples via a degraded distribution to train a discriminator, and Discriminative CDM, which directly aggregates step-wise contrastive momentum between an amateur and an expert model as a quality score. The authors demonstrate that CDM yields higher correlation with human judgments than strong baselines on multi-turn dialogue coherence and commonsense generation tasks, including a CommonsGen-trinity evaluation where CDM achieves state-of-the-art. Overall, CDM provides a scalable, reference-free, distribution-focused framework for evaluating open-domain generation with practical impact for model development and benchmarking.

Abstract

Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models' generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the _contrast_ of two probabilistic distributions -- one known to be superior to the other -- to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) _Generative_ CDM, which harnesses the contrast of two language models' distributions to generate synthetic examples for training discriminator-based metrics; 2) _Discriminative_ CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM's superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.

Open-Domain Text Evaluation via Contrastive Distribution Methods

TL;DR

This work introduces Contrastive Distribution Methods (CDM) for open-domain text evaluation, framing model quality as an oracle function

and leveraging a partial-order across model sizes to contrast distributions from two models. It develops two evaluation paradigms: Generative CDM, which synthesizes challenging negative samples via a degraded distribution to train a discriminator, and Discriminative CDM, which directly aggregates step-wise contrastive momentum between an amateur and an expert model as a quality score. The authors demonstrate that CDM yields higher correlation with human judgments than strong baselines on multi-turn dialogue coherence and commonsense generation tasks, including a CommonsGen-trinity evaluation where CDM achieves state-of-the-art. Overall, CDM provides a scalable, reference-free, distribution-focused framework for evaluating open-domain generation with practical impact for model development and benchmarking.

Abstract

Paper Structure (35 sections, 4 equations, 3 figures, 10 tables, 1 algorithm)

This paper contains 35 sections, 4 equations, 3 figures, 10 tables, 1 algorithm.

Introduction
Background and Related Works
Open-Domain Text Evaluation
Discriminator-based Metrics
Distribution/Divergence-based Metrics
Contrastive Decoding, Contrastive Momentum and ExPO
Methodology
Notations and Problem Formulation
The Partial Order Assumption
Limitation
First Order Approximation of $E(p)$
Contrastive Distribution Methods
Generative CDM
Implementation Details.
Discriminative CDM
...and 20 more sections

Figures (3)

Figure 1: Conceptual illustration of the Contrastive Distribution Methods (CDM). (a) Generative CDM generates negative examples for training a discriminator-based metric. (b) Discriminative CDM directly evaluate the distribution/sequence by contrasting the step-wise likelihood scores.
Figure 2: (a) While it is hard to assume a total order for models from different model classes under the oracle metric $E(p)$, it is plausible to assume partial orders for models from the same model class. (b) Generative CDM uses the degraded distribution $p_n$ to synthesize fake samples for training a discriminator as the metric. The warm/cold region indicates the decision boundary of the resulting trainable metric induced by fake samples from $p_n$. (c) Discriminative CDM directly determines the decision boundary by pooling the values of the step-wise contrastive momentum.
Figure 3: A more detailed illustration of the two Contrastive Distribution Methods (CDM). (a) Generative CDM constructs fake negative samples from positive ones for training a discriminator-based metric. (b) Discriminative CDM directly evaluate the distribution/sequence by contrasting and aggregating the step-wise likelihood scores.

Open-Domain Text Evaluation via Contrastive Distribution Methods

TL;DR

Abstract

Open-Domain Text Evaluation via Contrastive Distribution Methods

Authors

TL;DR

Abstract

Table of Contents

Figures (3)