Table of Contents
Fetching ...

A Planning Framework for Adaptive Labeling

Daksh Mittal, Yuanzhe Ma, Shalmali Joshi, Hongseok Namkoong

TL;DR

This work proposes a direct backpropagation-based approach, Smoothed-Autodiff, based on a carefully smoothed version of the original non-differentiable MDP that enjoys low variance at the price of introducing bias, and theoretically and empirically shows that this trade-off can be favorable.

Abstract

Ground truth labels/outcomes are critical for advancing scientific and engineering applications, e.g., evaluating the treatment effect of an intervention or performance of a predictive model. Since randomly sampling inputs for labeling can be prohibitively expensive, we introduce an adaptive labeling framework where measurement effort can be reallocated in batches. We formulate this problem as a Markov decision process where posterior beliefs evolve over time as batches of labels are collected (state transition), and batches (actions) are chosen to minimize uncertainty at the end of data collection. We design a computational framework that is agnostic to different uncertainty quantification approaches including those based on deep learning, and allows a diverse array of policy gradient approaches by relying on continuous policy parameterizations. On real and synthetic datasets, we demonstrate even a one-step lookahead policy can substantially outperform common adaptive labeling heuristics, highlighting the virtue of planning. On the methodological side, we note that standard REINFORCE-style policy gradient estimators can suffer high variance since they rely only on zeroth order information. We propose a direct backpropagation-based approach, Smoothed-Autodiff, based on a carefully smoothed version of the original non-differentiable MDP. Our method enjoys low variance at the price of introducing bias, and we theoretically and empirically show that this trade-off can be favorable.

A Planning Framework for Adaptive Labeling

TL;DR

This work proposes a direct backpropagation-based approach, Smoothed-Autodiff, based on a carefully smoothed version of the original non-differentiable MDP that enjoys low variance at the price of introducing bias, and theoretically and empirically shows that this trade-off can be favorable.

Abstract

Ground truth labels/outcomes are critical for advancing scientific and engineering applications, e.g., evaluating the treatment effect of an intervention or performance of a predictive model. Since randomly sampling inputs for labeling can be prohibitively expensive, we introduce an adaptive labeling framework where measurement effort can be reallocated in batches. We formulate this problem as a Markov decision process where posterior beliefs evolve over time as batches of labels are collected (state transition), and batches (actions) are chosen to minimize uncertainty at the end of data collection. We design a computational framework that is agnostic to different uncertainty quantification approaches including those based on deep learning, and allows a diverse array of policy gradient approaches by relying on continuous policy parameterizations. On real and synthetic datasets, we demonstrate even a one-step lookahead policy can substantially outperform common adaptive labeling heuristics, highlighting the virtue of planning. On the methodological side, we note that standard REINFORCE-style policy gradient estimators can suffer high variance since they rely only on zeroth order information. We propose a direct backpropagation-based approach, Smoothed-Autodiff, based on a carefully smoothed version of the original non-differentiable MDP. Our method enjoys low variance at the price of introducing bias, and we theoretically and empirically show that this trade-off can be favorable.

Paper Structure

This paper contains 68 sections, 2 theorems, 54 equations, 35 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

For $\theta \in [0,1]$ and $N \le \frac{1}{4\theta(1-\theta)} - 1$, there exists $\Tilde{\tau}$ depending on $(N, \theta)$ such that Additionally, for any $N$, $\theta = \frac{1}{k}$, and $k \to \infty$, we have that $\mathsf{MSE}(\hat{\nabla}^{\mathsf{RF}}_{N}) = \Omega(k)$, $\mathsf{MSE}(\hat{\nabla}^{\mathsf{grad}}_{\tilde{\tau},N}) < 4$. The same statement holds for $\theta = 1-\frac{1}{k}

Figures (35)

  • Figure 1: Adaptive labeling to reduce epistemic uncertainty over model performance. Among the two clusters of unlabeled examples (left vs. right), we must learn to prioritize labeling inputs from the left cluster to better evaluate mean squared error.
  • Figure 2: Overview of our adaptive sampling framework. At each period, we select batch of inputs $\mathcal{X}^t$ to be labeled, and obtain a new labeled data $\mathcal{D}_t$. We view posterior beliefs $\mu_t(\cdot)$ on $f^\star(Y|X)$ (or $f^\star(Y|X,Z)$) as the "state", and update it as additional labeled data is collected. Our goal is to minimize uncertainty on the estimand of interest (performance of predictive model or ATE) at the end of $T$ periods.
  • Figure 3: MDP framework for adaptive labeling to efficiently estimate the average treatment effect (ATE).
  • Figure 4: One-step lookahead roll-out for policy gradient estimation
  • Figure 5: Differentiable one-step lookahead pipeline for efficient adaptive sampling
  • ...and 30 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2