Table of Contents
Fetching ...

Beyond Labels: Information-Efficient Human-in-the-Loop Learning using Ranking and Selection Queries

Belén Martín-Urcelay, Yoonsang Lee, Matthieu R. Bloch, Christopher J. Rozell

TL;DR

This work tackles the information bottleneck in human-in-the-loop learning by replacing simple labels with information-rich queries—ranking and exemplar selection—to train binary classifiers more efficiently. It introduces probabilistic human response models grounded in the embedding-distance score relation, and develops a variational Bayesian framework with a greedy, information-theoretic query selection strategy. The approach yields theoretical bounds on stopping time and substantial empirical gains: up to 85% fewer human interactions in word sentiment tasks and notable time savings when optimizing for information rate, demonstrated on word sentiment and image aesthetics datasets. By leveraging the geometry of embeddings and cost-aware query planning, the method enables faster, more cost-effective alignment of models with nuanced human judgments.

Abstract

Integrating human expertise into machine learning systems often reduces the role of experts to labeling oracles, a paradigm that limits the amount of information exchanged and fails to capture the nuances of human judgment. We address this challenge by developing a human-in-the-loop framework to learn binary classifiers with rich query types, consisting of item ranking and exemplar selection. We first introduce probabilistic human response models for these rich queries motivated by the relationship experimentally observed between the perceived implicit score of an item and its distance to the unknown classifier. Using these models, we then design active learning algorithms that leverage the rich queries to increase the information gained per interaction. We provide theoretical bounds on sample complexity and develop a tractable and computationally efficient variational approximation. Through experiments with simulated annotators derived from crowdsourced word-sentiment and image-aesthetic datasets, we demonstrate significant reductions on sample complexity. We further extend active learning strategies to select queries that maximize information rate, explicitly balancing informational value against annotation cost. This algorithm in the word sentiment classification task reduces learning time by more than 57\% compared to traditional label-only active learning.

Beyond Labels: Information-Efficient Human-in-the-Loop Learning using Ranking and Selection Queries

TL;DR

This work tackles the information bottleneck in human-in-the-loop learning by replacing simple labels with information-rich queries—ranking and exemplar selection—to train binary classifiers more efficiently. It introduces probabilistic human response models grounded in the embedding-distance score relation, and develops a variational Bayesian framework with a greedy, information-theoretic query selection strategy. The approach yields theoretical bounds on stopping time and substantial empirical gains: up to 85% fewer human interactions in word sentiment tasks and notable time savings when optimizing for information rate, demonstrated on word sentiment and image aesthetics datasets. By leveraging the geometry of embeddings and cost-aware query planning, the method enables faster, more cost-effective alignment of models with nuanced human judgments.

Abstract

Integrating human expertise into machine learning systems often reduces the role of experts to labeling oracles, a paradigm that limits the amount of information exchanged and fails to capture the nuances of human judgment. We address this challenge by developing a human-in-the-loop framework to learn binary classifiers with rich query types, consisting of item ranking and exemplar selection. We first introduce probabilistic human response models for these rich queries motivated by the relationship experimentally observed between the perceived implicit score of an item and its distance to the unknown classifier. Using these models, we then design active learning algorithms that leverage the rich queries to increase the information gained per interaction. We provide theoretical bounds on sample complexity and develop a tractable and computationally efficient variational approximation. Through experiments with simulated annotators derived from crowdsourced word-sentiment and image-aesthetic datasets, we demonstrate significant reductions on sample complexity. We further extend active learning strategies to select queries that maximize information rate, explicitly balancing informational value against annotation cost. This algorithm in the word sentiment classification task reduces learning time by more than 57\% compared to traditional label-only active learning.
Paper Structure (25 sections, 6 theorems, 44 equations, 16 figures, 5 tables, 5 algorithms)

This paper contains 25 sections, 6 theorems, 44 equations, 16 figures, 5 tables, 5 algorithms.

Key Result

Theorem 3.5

Let $T_\epsilon=\min\{t: \left|\boldsymbol\Sigma_{\boldsymbol\theta|\mathcal{F}_t}\right|^{1/d} < \epsilon\}$ be the stopping time of Algorithm algo.:ideal. Under Assumptions assum_humanResponse through assum.:prior, $\mathbb{E}[T_{\epsilon}]$ is bounded as with $N = (|\mathcal{S}|+1)!$ and $L=L_r$ for ranking queries $q = q_{\text{rank}}$, and with $N = 2|\mathcal{S}|$ and $L=L_s$ for exemplar s

Figures (16)

  • Figure 1: Scores for words (a, b) and image (c, d) attributes as a function of the inner product between their pre-defined embedding and the ground truth classifier. We observe there exists an approximately affine relationship. Pre-trained embeddings naturally encode score information as distance from decision boundary, enabling information-rich queries beyond binary labels.
  • Figure 2: Block diagram for human-in-the-loop learning for sentiment word classification. At each interaction, the human annotator receives the query with items that maximize the information gain about the ground truth classifier $\boldsymbol{\theta}$. In the example, we ask the annotator to select a word from a list, and provide its label. The answer to the query is used to update the estimator of the classifier and select the next query items.
  • Figure 3: Performance of the human in the loop learning algorithms with human data on the word sentiment analysis task. All configurations are run with 10 different random initializations. The lines represent the mean of those experiments, while the shaded areas represent the standard error. Adding word selection or ranking to the queries together with actively selecting the word set reduces the number of iterations needed to achieve a good performance.
  • Figure 4: Performance of Algorithm \ref{['algo.:approx_selection']} with human data on word sentiment classification. The larger the word set, the faster the decrease of MSE, as suggested by the lower bound in Theorem \ref{['thm.:Exact_T_bounds']}.
  • Figure 5: Performance of the human in the loop learning algorithms with human data on the image aesthetic classification task across 10 initializations. There are $|\mathcal{S}| = 4$ candidate images for ranking and selection questions. The accuracy increases faster when asking richer queries.
  • ...and 11 more figures

Theorems & Definitions (12)

  • Theorem 3.5
  • proof
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Corollary B.3
  • proof
  • Lemma B.4
  • proof
  • ...and 2 more