Score-Based Density Estimation from Pairwise Comparisons

Petrus Mikkola; Luigi Acerbi; Arto Klami

Score-Based Density Estimation from Pairwise Comparisons

Petrus Mikkola, Luigi Acerbi, Arto Klami

TL;DR

The paper tackles learning a target density from pairwise comparisons by linking it to a tempered marginal winner density via a position-dependent tempering field, enabling score-based estimation. It proves that under Bradley–Terry and exponential RUMs the scores satisfy $ abla \log p(oldsymbol{x}) = \tau(\boldsymbol{x}) \nabla \log p_w(\boldsymbol{x})$, and develops a practical diffusion-based method to recover $p$ by estimating $p_w$ and the tempering field, then sampling from $p$ with score-scaled ALD. The approach jointly models the joint and marginal densities to obtain the MWD score, estimates the tempering field through a learned density ratio under BT, and demonstrates improved accuracy over prior flow-based methods on synthetic targets and a real-data proxy (LLM), using hundreds to thousands of pairwise queries. The work advances expert-knowledge elicitation and potential fine-tuning of generative systems by providing a principled, score-based mechanism to represent and sample from human beliefs. Overall, it delivers a theoretically grounded, practically effective framework for density estimation from limited preference data with broad applicability.

Abstract

We study density estimation from pairwise comparisons, motivated by expert knowledge elicitation and learning from human feedback. We relate the unobserved target density to a tempered winner density (marginal density of preferred choices), learning the winner's score via score-matching. This allows estimating the target by `de-tempering' the estimated winner density's score. We prove that the score vectors of the belief and the winner density are collinear, linked by a position-dependent tempering field. We give analytical formulas for this field and propose an estimator for it under the Bradley-Terry model. Using a diffusion model trained on tempered samples generated via score-scaled annealed Langevin dynamics, we can learn complex multivariate belief densities of simulated experts, from only hundreds to thousands of pairwise comparisons.

Score-Based Density Estimation from Pairwise Comparisons

TL;DR

Abstract

Score-Based Density Estimation from Pairwise Comparisons

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (18)