Table of Contents
Fetching ...

Learning Linear Utility Functions From Pairwise Comparison Queries

Luise Ge, Brendan Juba, Yevgeniy Vorobeychik

TL;DR

We address learning linear utilities from pairwise comparisons under random utility models, focusing on two objectives: predicting pairwise outcomes ($e_1$) and recovering the true weights ($e_2$). The paper analyzes both passive and active learning settings: in passive learning, predicting preferences is efficiently achievable in the noise-free case and under Tsybakov-like noise with well-behaved inputs, but robust parameter recovery ($e_2$) is generally impossible without structural assumptions; active learning, by contrast, yields polynomial-sample-efficient algorithms for both tasks, including noisy settings, and exhibits a qualitative gap relative to passive learning. The key contributions include a tight characterization of when $e_1$ is PAC-PC learnable passively, a curvature-based condition yielding fast $e_2$-estimation bounds under BT-like noise, impossibility results without noise structure, and novel active-learning procedures with provable guarantees that dramatically reduce sample complexity under realistic noise models. These results have practical implications for reward modeling and preference-based learning in continuous, high-dimensional spaces, including RLHF-style settings, by formalizing when and how actively selected pairwise queries can substantially improve utility learning.

Abstract

We study learnability of linear utility functions from pairwise comparison queries. In particular, we consider two learning objectives. The first objective is to predict out-of-sample responses to pairwise comparisons, whereas the second is to approximately recover the true parameters of the utility function. We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective, both when query responses are uncorrupted by noise, and under Tsybakov noise when the distributions are sufficiently "nice". In contrast, we show that utility parameters are not learnable for a large set of data distributions without strong modeling assumptions, even when query responses are noise-free. Next, we proceed to analyze the learning problem in an active learning setting. In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings. Our results thus exhibit a qualitative learnability gap between passive and active learning from pairwise preference queries, demonstrating the value of the ability to select pairwise queries for utility learning.

Learning Linear Utility Functions From Pairwise Comparison Queries

TL;DR

We address learning linear utilities from pairwise comparisons under random utility models, focusing on two objectives: predicting pairwise outcomes () and recovering the true weights (). The paper analyzes both passive and active learning settings: in passive learning, predicting preferences is efficiently achievable in the noise-free case and under Tsybakov-like noise with well-behaved inputs, but robust parameter recovery () is generally impossible without structural assumptions; active learning, by contrast, yields polynomial-sample-efficient algorithms for both tasks, including noisy settings, and exhibits a qualitative gap relative to passive learning. The key contributions include a tight characterization of when is PAC-PC learnable passively, a curvature-based condition yielding fast -estimation bounds under BT-like noise, impossibility results without noise structure, and novel active-learning procedures with provable guarantees that dramatically reduce sample complexity under realistic noise models. These results have practical implications for reward modeling and preference-based learning in continuous, high-dimensional spaces, including RLHF-style settings, by formalizing when and how actively selected pairwise queries can substantially improve utility learning.

Abstract

We study learnability of linear utility functions from pairwise comparison queries. In particular, we consider two learning objectives. The first objective is to predict out-of-sample responses to pairwise comparisons, whereas the second is to approximately recover the true parameters of the utility function. We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective, both when query responses are uncorrupted by noise, and under Tsybakov noise when the distributions are sufficiently "nice". In contrast, we show that utility parameters are not learnable for a large set of data distributions without strong modeling assumptions, even when query responses are noise-free. Next, we proceed to analyze the learning problem in an active learning setting. In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings. Our results thus exhibit a qualitative learnability gap between passive and active learning from pairwise preference queries, demonstrating the value of the ability to select pairwise queries for utility learning.
Paper Structure (9 sections, 11 theorems, 20 equations, 1 figure, 2 algorithms)

This paper contains 9 sections, 11 theorems, 20 equations, 1 figure, 2 algorithms.

Key Result

Theorem 1

Suppose $\zeta=0$. Then linear utility functions are efficiently PAC-PC learnable under the error function $e_1$.

Figures (1)

  • Figure 1: From the label we know $w^*$ is below the hyperplane of $\Delta_\phi(x)$.

Theorems & Definitions (24)

  • Definition 1: PAC-PC learnability
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Definition 2: Tsybakov Noise Condition
  • Definition 3: Well-Behaved Distributions
  • Theorem 3: Learning Tsybakov Halfspaces under Well-Behaved Distributions
  • Theorem 4
  • proof
  • ...and 14 more