Table of Contents
Fetching ...

Active learning with biased non-response to label requests

Thomas Robinson, Niek Tax, Richard Mudd, Ido Guy

TL;DR

The paper addresses how biased non-response to label requests can undermine active learning in real-world, human-in-the-loop settings. It introduces the Upper Confidence Bound of the Expected Utility (UCB-EU), a simple, plug-in correction that weights query selection by the estimated probability of obtaining a label, using an upper confidence bound to handle uncertainty. Through synthetic experiments under MAR and MCAR and a Taobao case study, it demonstrates that UCB-EU can improve several AL strategies (e.g., Query-by-Committee, random sampling) and yield meaningful gains in CTR/conversion tasks, while identifying scenarios where bias persists. The work also highlights that non-response can induce local optima in learned decision boundaries, motivating future development of more robust abstention-aware AL methods.

Abstract

Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning's effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that biased non-response is likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy--the Upper Confidence Bound of the Expected Utility (UCB-EU)--that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for specific sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from an e-commerce platform. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy-to-implement correction that mitigates model degradation.

Active learning with biased non-response to label requests

TL;DR

The paper addresses how biased non-response to label requests can undermine active learning in real-world, human-in-the-loop settings. It introduces the Upper Confidence Bound of the Expected Utility (UCB-EU), a simple, plug-in correction that weights query selection by the estimated probability of obtaining a label, using an upper confidence bound to handle uncertainty. Through synthetic experiments under MAR and MCAR and a Taobao case study, it demonstrates that UCB-EU can improve several AL strategies (e.g., Query-by-Committee, random sampling) and yield meaningful gains in CTR/conversion tasks, while identifying scenarios where bias persists. The work also highlights that non-response can induce local optima in learned decision boundaries, motivating future development of more robust abstention-aware AL methods.

Abstract

Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning's effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that biased non-response is likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy--the Upper Confidence Bound of the Expected Utility (UCB-EU)--that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for specific sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from an e-commerce platform. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy-to-implement correction that mitigates model degradation.
Paper Structure (15 sections, 8 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 8 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: The knock-on consequences of non-response on AL. From the same initial model, non-response leads to volume and imbalance effects in the AL sequence. Here, the result of these effects is the repeated querying of a non-responsive example. Colored blocks refer to data values and the red cross indicates a non-response label.
  • Figure 2: Illustration of synthetic datasets used in AL experiments. Note: MAR-1 is a restricted view of the data, and only shows two of the five X dimensions.
  • Figure 3: AL model performance in the presence of non-response, using different sampling strategies. $\mathbb{E}[R] = 0.3$ across all simulations. Observations in the missing region had a 0.001 probability of response. Shaded areas show the 95% confidence interval over 200 separate simulations (per non-response mechanism).
  • Figure 4: The effect of imbalance on model performance between MAR and MCAR non-response mechanisms. The probabilities above each panel indicate the probability of response in the low response region of the feature space. Shaded areas show 95% confidence intervals over 200 simulations.
  • Figure 5: Comparison of Query-by-Committee AL performance with and without UCB-EU correction, under MAR non-response. The DGP is identical to the results presented in Figure \ref{['fig:curtailed_sims']}.
  • ...and 4 more figures