Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Luckeciano C. Melo; Panagiotis Tigas; Alessandro Abate; Yarin Gal

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Luckeciano C. Melo, Panagiotis Tigas, Alessandro Abate, Yarin Gal

TL;DR

This work proposes the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM.

Abstract

Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

TL;DR

Abstract

Paper Structure (21 sections, 17 equations, 16 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 17 equations, 16 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Bayesian Active Learner for Preference Modeling
Experiments and Discussion
Experiments
Closing Remarks
Impact Statement
Reproducibility Statement
Hyperparameters
Ablation Studies
KL Entropy Estimator: Review and Assumptions
BAL-PM Objective -- Empirical Analysis
Is Log-Likelihood a proper performance measure for Preference Modeling?
Do the models better ranked by Average Log Likelihood (LL) lead to better fine-tuned policies?
...and 6 more sections

Figures (16)

Figure 1: Log-Likelihood of learned preference models in the Reddit TL;DR datasetNEURIPS2020_1f89885d. Our method, BAL-PM, reduces the volume of required human feedback by 33% over random acquisition.
Figure 2: An illustration of how BAL-PM works. For each tuple $(x, y_{1}, y_{2}) \in \mathcal{D}_{pool}$, we obtain features for the prompt and prompt-completion pairs by computing the last layer embeddings of the base LLM. We leverage the prompt feature space to estimate the entropy score of the acquired prompt distribution, $\hat{\mathcal{H}}({X}_{train} \cup \{x\})$. Similarly, we use the prompt-completion features as input for the Bayesian Preference Model, which is used to estimate task-dependent epistemic uncertainty scores, $\hat{U}(x, y_{1}, y_{2})$. BAL-PM selects the tuple that maximizes the linear combination of both scores.
Figure 3: Illustration of entropy estimators. The green point maximizes the entropy estimation of the prompt distribution (according to the employed estimator). Dashed lines represent its k-NN distance. In (a), the KL estimator (Equation \ref{['eq:klent']}) does not account for the available prompts in the pool (in red) and underestimates the density in regions not covered by the acquired set (in blue). In (b), the KSG estimator (Equation \ref{['eq:ksg']}) uses all data points, leading to better estimation and effectively selecting the point that maximizes the true entropy.
Figure 4: Comparison with baseline methods in Active Preference Modeling. BAL-PM considerably reduces the number of samples required for preference modeling, achieving 33% and 68% of reduction in the Reddit TL;DR test split and CNN/DM News datasets, respectively. The shaded area corresponds to the standard error computed over five seeds.
Figure 5: Comparison with Bayesian stochastic acquisition policies for Active Preference Modeling. BAL-PM consistently outperforms other policies in Test and OOD settings.
...and 11 more figures

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

TL;DR

Abstract

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)