CERET: Cost-Effective Extrinsic Refinement for Text Generation

Jason Cai; Hang Su; Monica Sunkara; Igor Shalyminov; Saab Mansour

CERET: Cost-Effective Extrinsic Refinement for Text Generation

Jason Cai, Hang Su, Monica Sunkara, Igor Shalyminov, Saab Mansour

TL;DR

This work proposes CERET, a method for refining text generations by considering semantic stability, entailment and inter-sample uncertainty measures, and shows that CERET outperforms Self-consistency and Self-rerank baselines consistently under various task setups.

Abstract

Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by their high computational cost and lack of scalability. In this work, we propose CERET, a method for refining text generations by considering semantic stability, entailment and inter-sample uncertainty measures. Experimental results show that CERET outperforms Self-consistency and Self-rerank baselines consistently under various task setups, by ~1.6% in Rouge-1 for abstractive summarization and ~3.5% in hit rate for question answering. Compared to LLM Self-rerank method, our approach only requires 9.4% of its latency and is more cost-effective.

CERET: Cost-Effective Extrinsic Refinement for Text Generation

TL;DR

Abstract

Paper Structure (22 sections, 10 equations, 6 figures, 9 tables)

This paper contains 22 sections, 10 equations, 6 figures, 9 tables.

Introduction
Approach
System Architecture
Semantic Stability Scoring
Entailment Scoring
Inter-sample Uncertainty Scoring
Computation of Final Score
Experimental Setup
Datasets
Baselines and Evaluation Metrics
Implementation Details
Results and Analysis
Effectiveness and Efficiency
Ablations and Hyperparameter Analysis
Related Work
...and 7 more sections

Figures (6)

Figure 1: CERET overview
Figure 2: Inter-sample uncertainty region
Figure 3: Latency (sec) per input sample. From left to right: *BERT inference, CERET, and LLM self-rerank.
Figure 4: Relative performance gains on validation and test sets. The best coefficient combination is tuned on validation sets. Evaluation metrics: Rouge-1 for TodSum and DialogSum, and hit rate for Trivia QA and Natural Questions.
Figure 5: Sensitivity analysis of coefficients for TodSum
...and 1 more figures

CERET: Cost-Effective Extrinsic Refinement for Text Generation

TL;DR

Abstract

CERET: Cost-Effective Extrinsic Refinement for Text Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)