A Probabilistic Approach for Model Alignment with Human Comparisons

Junyu Cao; Mohsen Bayati

A Probabilistic Approach for Model Alignment with Human Comparisons

Junyu Cao, Mohsen Bayati

TL;DR

A two-stage"Supervised Learning+Learning from Human Feedback"(SL+LHF) framework that connects machine learning with human feedback through a probabilistic bisection approach is proposed and the conditions under which the"SL+LHF"framework outperforms the pure SL approach are identified.

Abstract

A growing trend involves integrating human knowledge into learning frameworks, leveraging subtle human feedback to refine AI models. While these approaches have shown promising results in practice, the theoretical understanding of when and why such approaches are effective remains limited. This work takes steps toward developing a theoretical framework for analyzing the conditions under which human comparisons can enhance the traditional supervised learning process. Specifically, this paper studies the effective use of noisy-labeled data and human comparison data to address challenges arising from noisy environment and high-dimensional models. We propose a two-stage "Supervised Learning+Learning from Human Feedback" (SL+LHF) framework that connects machine learning with human feedback through a probabilistic bisection approach. The two-stage framework first learns low-dimensional representations from noisy-labeled data via an SL procedure and then uses human comparisons to improve the model alignment. To examine the efficacy of the alignment phase, we introduce a concept, termed the "label-noise-to-comparison-accuracy" (LNCA) ratio. This paper identifies from a theoretical perspective the conditions under which the "SL+LHF" framework outperforms the pure SL approach; we then leverage this LNCA ratio to highlight the advantage of incorporating human evaluators in reducing sample complexity. We validate that the LNCA ratio meets the proposed conditions for its use through a case study conducted via Amazon Mechanical Turk (MTurk).

A Probabilistic Approach for Model Alignment with Human Comparisons

TL;DR

Abstract

Paper Structure (39 sections, 18 theorems, 122 equations, 9 figures, 4 algorithms)

This paper contains 39 sections, 18 theorems, 122 equations, 9 figures, 4 algorithms.

Introduction
Our Contributions
Modeling.
Theoretical contributions.
Towards practical implementations.
Empirical analysis.
Literature Review
Model
Supervised Learning and Underspecifications
Utility Model of Human Comparisons
One-dimensional Human Comparison
Deterministic Bisection
Probabilistic Bisection
Vertical moves
Horizontal moves
...and 24 more sections

Key Result

Lemma 1

For any two models $f_{\boldsymbol{\theta}_1}$ and $f_{\boldsymbol{\theta}_2}$, the probability that a human will make the right selection is which is strictly greater than 1/2.

Figures (9)

Figure 1: An illustrative example. Among two choices $c_{\Delta}^-(\theta_k)$ and $c_{\Delta}^+(\theta_k)$, $c_{\Delta}^-(\theta_k)$ would be selected because it is closer to the true parameter. In this case, the interval $(\theta_k, \theta^+]$ is eliminated.
Figure 2: Roadmap for introducing Algorithm \ref{['Alg: RTB']}.
Figure 3: Error ratio of the two-stage framework to the pure SL with different values of $\sigma$.
Figure 4: Error ratio of the two-stage framework to the pure SL with different values of $\gamma$.
Figure 5: Error ratio of the two-stage framework to the pure SL with different values of $s$.
...and 4 more figures

Theorems & Definitions (45)

Example 1: Sparse Linear Models
Example 2: Generalized Low-rank Models
Example 3
Remark 1: Model-level Comparison and Sample-level Comparison
Example 4
Remark 2
Lemma 1: Precision
Proposition 1
Example 5
Definition 1: $(\varepsilon,\delta)$-alignment problem
...and 35 more

A Probabilistic Approach for Model Alignment with Human Comparisons

TL;DR

Abstract

A Probabilistic Approach for Model Alignment with Human Comparisons

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (45)