Table of Contents
Fetching ...

Selective Generation for Controllable Language Models

Minjae Lee, Kyungmin Kim, Taesoo Kim, Sangdon Park

TL;DR

This work tackles hallucination in generative language models by redefining correctness through textual entailment and introducing a false discovery rate with respect to entailment, $FDR\text{-}E$. It develops two abstention-based algorithms, $SGen^{\text{Sup}}$ and the semi-supervised $SGen^{\text{Semi}}$, where the latter leverages unlabeled data through conformal-prediction-based pseudo-labeling and a flexible neuro-selection framework to optimize selection efficiency. The authors provide PAC-style guarantees for both supervised and semi-supervised settings and demonstrate empirical effectiveness on open- and closed-source GLMs with realistic entailment labeling, showing controllable FDR\text{-}E at target risk levels. The approach enables safer deployment of GLMs for open-ended QA by reliably limiting incorrect generations while maintaining competitive answer throughput; code and datasets are publicly available.

Abstract

Trustworthiness of generative language models (GLMs) is crucial in their deployment to critical decision making systems. Hence, certified risk control methods such as selective prediction and conformal prediction have been applied to mitigating the hallucination problem in various supervised downstream tasks. However, the lack of appropriate correctness metric hinders applying such principled methods to language generation tasks. In this paper, we circumvent this problem by leveraging the concept of textual entailment to evaluate the correctness of the generated sequence, and propose two selective generation algorithms which control the false discovery rate with respect to the textual entailment relation (FDR-E) with a theoretical guarantee: $\texttt{SGen}^{\texttt{Sup}}$ and $\texttt{SGen}^{\texttt{Semi}}$. $\texttt{SGen}^{\texttt{Sup}}$, a direct modification of the selective prediction, is a supervised learning algorithm which exploits entailment-labeled data, annotated by humans. Since human annotation is costly, we further propose a semi-supervised version, $\texttt{SGen}^{\texttt{Semi}}$, which fully utilizes the unlabeled data by pseudo-labeling, leveraging an entailment set function learned via conformal prediction. Furthermore, $\texttt{SGen}^{\texttt{Semi}}$ enables to use more general class of selection functions, neuro-selection functions, and provides users with an optimal selection function class given multiple candidates. Finally, we demonstrate the efficacy of the $\texttt{SGen}$ family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs. Code and datasets are provided at https://github.com/ml-postech/selective-generation.

Selective Generation for Controllable Language Models

TL;DR

This work tackles hallucination in generative language models by redefining correctness through textual entailment and introducing a false discovery rate with respect to entailment, . It develops two abstention-based algorithms, and the semi-supervised , where the latter leverages unlabeled data through conformal-prediction-based pseudo-labeling and a flexible neuro-selection framework to optimize selection efficiency. The authors provide PAC-style guarantees for both supervised and semi-supervised settings and demonstrate empirical effectiveness on open- and closed-source GLMs with realistic entailment labeling, showing controllable FDR\text{-}E at target risk levels. The approach enables safer deployment of GLMs for open-ended QA by reliably limiting incorrect generations while maintaining competitive answer throughput; code and datasets are publicly available.

Abstract

Trustworthiness of generative language models (GLMs) is crucial in their deployment to critical decision making systems. Hence, certified risk control methods such as selective prediction and conformal prediction have been applied to mitigating the hallucination problem in various supervised downstream tasks. However, the lack of appropriate correctness metric hinders applying such principled methods to language generation tasks. In this paper, we circumvent this problem by leveraging the concept of textual entailment to evaluate the correctness of the generated sequence, and propose two selective generation algorithms which control the false discovery rate with respect to the textual entailment relation (FDR-E) with a theoretical guarantee: and . , a direct modification of the selective prediction, is a supervised learning algorithm which exploits entailment-labeled data, annotated by humans. Since human annotation is costly, we further propose a semi-supervised version, , which fully utilizes the unlabeled data by pseudo-labeling, leveraging an entailment set function learned via conformal prediction. Furthermore, enables to use more general class of selection functions, neuro-selection functions, and provides users with an optimal selection function class given multiple candidates. Finally, we demonstrate the efficacy of the family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs. Code and datasets are provided at https://github.com/ml-postech/selective-generation.
Paper Structure (36 sections, 6 theorems, 34 equations, 4 figures, 5 tables, 10 algorithms)

This paper contains 36 sections, 6 theorems, 34 equations, 4 figures, 5 tables, 10 algorithms.

Key Result

Lemma 1

(E) in (eq:fdrdecomp-ss-ssl) is decomposed as follows:

Figures (4)

  • Figure 1: An overview and qualitative results of our method with GPT-3.5-Turbo. The crux is to learn an entailment-aware selective generator with an abstaining option that controls the rate of hallucination (in a false discovery rate) over generated sequences with a probabilistic guarantee.
  • Figure 2: Decomposition of a false discovery rate with respect to an entailment set $E_\text{true}$ (FDR-E). Here, $\Omega_{\text{TD}}^{E}\coloneqq \{(\mathbf{x}, \mathbf{y}, e, v) \mid G(\mathbf{x}) \in E(\mathbf{y})\}$.
  • Figure 3: Efficiency results over different numbers of unlabeled samples. (a) and (b) use SGen$^{\texttt{Semi}}_\texttt{NoMS}$ with $f_{M_2}$ score. (c) and (d) use SGen$^\texttt{Semi}$ that has neuro-selection function. Both methods show increasing performance as more unlabeled samples $\mathbf{Z}_U$ are used. For each experiment, the values were measured after averaging 10 random splits and an error bar means standard deviation.
  • Figure 4: FDR-E box plots of methods for GPT-3.5-turbo. We randomly split the calibration ad test set 100 times for box plots. For supervised methods (a), we use all entailment labels, i.e.,$|\mathbf{Z}_E|=|\mathbf{Z}_E^{\text{cal}}|$. For (b), which includes an unsupervised method (SGen$_\texttt{EM}$) and semi-supervised methods, we use $|\mathbf{Z}_E|=0.75|\mathbf{Z}_E^{\text{cal}}|$. All methods except for SGen$^\texttt{Semi}$ use $f_{M_1}$ as a score function. The methods that do not control $\varepsilon_S$ FDR-E in learning at least once are drawn using red boxes but otherwise using green boxes in \ref{['fig:gpt-3.5:boxplot:f_M1:SL']} and \ref{['fig:gpt-3.5:boxplot:f_M1:SSL']}. We draw the whisker plot to indicate $100\delta \%$ and $100(1-\delta)\%$ quantiles. In both (a) and (b) with green boxes, as the top of the whisker is below of the dotted line, we can see that the FDR-E is well controlled with probability at least $\delta$, i.e., they satisfy the PAC guarantee. The numbers of iterations that satisfy $\varepsilon_S$ FDR-E in learning while running 100 iterations are (a) SGen$_\texttt{EM}$$= 0$, SGen$^\texttt{Sup}$$= 100$, SGen$^{\texttt{Semi-Sup}}_\texttt{NoMS}$$= 100$ and (b) SGen$_\texttt{PL}^\texttt{H-Semi}$$= 100$, SGen$_\texttt{PFL}^\texttt{H-Semi}$$= 100$, SGen$^{\texttt{Semi}}_\texttt{NoMS}$$= 18$, SGen$^\texttt{Semi}$$= 100$.

Theorems & Definitions (6)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 1
  • Theorem 2