Learning Parametric Distributions from Samples and Preferences

Marc Jourdan; Gizem Yüce; Nicolas Flammarion

Learning Parametric Distributions from Samples and Preferences

Marc Jourdan, Gizem Yüce, Nicolas Flammarion

TL;DR

This work investigates when preference feedback improves parameter estimation for continuous parametric distributions. It shows that stochastic preferences yield lower asymptotic variance than sample-only estimators, and that deterministic preferences enable a true acceleration to an $\mathcal{O}(1/n)$ estimation rate via hard, constraint-driven updates, with a matching lower bound up to constants. The results hold under general, though restrictive, geometric conditions and are validated in Gaussian and Laplace settings with log-probability rewards. The findings have practical implications for preference-based fine-tuning and iterative alignment, suggesting new estimation paradigms that leverage hard preference constraints for faster learning in high-stakes settings.

Abstract

Recent advances in language modeling have underscored the role of preference feedback in enhancing model performance. This paper investigates the conditions under which preference feedback improves parameter estimation in classes of continuous parametric distributions. In our framework, the learner observes pairs of samples from an unknown distribution along with their relative preferences depending on the same unknown parameter. We show that preference-based M-estimators achieve a better asymptotic variance than sample-only M-estimators, further improved by deterministic preferences. Leveraging the hard constraints revealed by deterministic preferences, we propose an estimator achieving an estimation error scaling of $\mathcal{O}(1/n)$ -- a significant improvement over the $Θ(1/\sqrt{n})$ rate attainable with samples alone. Next, we establish a lower bound that matches this accelerated rate; up to dimension and problem-dependent constants. While the assumptions underpinning our analysis are restrictive, they are satisfied by notable cases such as Gaussian or Laplace distributions for preferences based on the log-probability reward.

Learning Parametric Distributions from Samples and Preferences

TL;DR

estimation rate via hard, constraint-driven updates, with a matching lower bound up to constants. The results hold under general, though restrictive, geometric conditions and are validated in Gaussian and Laplace settings with log-probability rewards. The findings have practical implications for preference-based fine-tuning and iterative alignment, suggesting new estimation paradigms that leverage hard preference constraints for faster learning in high-stakes settings.

Abstract

-- a significant improvement over the

rate attainable with samples alone. Next, we establish a lower bound that matches this accelerated rate; up to dimension and problem-dependent constants. While the assumptions underpinning our analysis are restrictive, they are satisfied by notable cases such as Gaussian or Laplace distributions for preferences based on the log-probability reward.

Learning Parametric Distributions from Samples and Preferences

TL;DR

Abstract

Learning Parametric Distributions from Samples and Preferences

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (16)