Table of Contents
Fetching ...

An Adaptive Method for Weak Supervision with Drifting Data

Alessio Mazzetto, Reza Esfandiarpoor, Akash Singirikonda, Eli Upfal, Stephen H. Bach

TL;DR

This work tackles weak supervision under non-stationary (drifting) data where weak labelers have time-varying accuracies $\mathbf{p}(t)$ and errors are conditionally independent given the true label. It introduces an adaptive windowing algorithm that, at each time $t$, selects a past-data window from a set $\mathcal{R}$ by comparing empirical correlation matrices $\hat{\mathbf{C}}^{[r]}(t)$ to minimize a trade-off between estimation variance and drift, and then estimates $\mathbf{p}(t)$ via a Bonald-minimax mapping from $\hat{\mathbf{C}}^{[r]}(t)$. The method provides formal guarantees on the estimation error $\|\mathbf{p}(t) - \hat{\mathbf{p}}(t)\|_\infty$ and shows that this adaptive strategy can match or surpass fixed-window approaches in both synthetic and real-data experiments, including video and image tasks. This drift-aware approach is particularly relevant for crowdsourcing and programmatic weak supervision, where distribution shifts are common and ground-truth labels are scarce or unavailable.

Abstract

We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. In contrast, our algorithm does not require any assumptions on the drift, and it adapts based on the input by dynamically varying its window size. In particular, at each step, our algorithm estimates the current accuracies of the weak supervision sources by identifying a window of past observations that guarantees a near-optimal minimization of the trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach adapts to the drift.

An Adaptive Method for Weak Supervision with Drifting Data

TL;DR

This work tackles weak supervision under non-stationary (drifting) data where weak labelers have time-varying accuracies and errors are conditionally independent given the true label. It introduces an adaptive windowing algorithm that, at each time , selects a past-data window from a set by comparing empirical correlation matrices to minimize a trade-off between estimation variance and drift, and then estimates via a Bonald-minimax mapping from . The method provides formal guarantees on the estimation error and shows that this adaptive strategy can match or surpass fixed-window approaches in both synthetic and real-data experiments, including video and image tasks. This drift-aware approach is particularly relevant for crowdsourcing and programmatic weak supervision, where distribution shifts are common and ground-truth labels are scarce or unavailable.

Abstract

We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. In contrast, our algorithm does not require any assumptions on the drift, and it adapts based on the input by dynamically varying its window size. In particular, at each step, our algorithm estimates the current accuracies of the weak supervision sources by identifying a window of past observations that guarantees a near-optimal minimization of the trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach adapts to the drift.
Paper Structure (23 sections, 13 theorems, 58 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 23 sections, 13 theorems, 58 equations, 14 figures, 2 tables, 1 algorithm.

Key Result

Proposition 4.2

Let $\bm{C} \in [-1,1]^{n \times n}$ be a matrix such that $\lVert \bm{C} - \bm{C}(t)\rVert_{\infty} \leq \epsilon$, and assume $n \geq 3$. Let Assumptions assu:sample-independents, assu:errors-independents and assu:bias hold. Then, there exists an estimation procedure that given in input $\bm{C}$,

Figures (14)

  • Figure 1: We report the window size chosen by the algorithm (left) and the estimated accuracies of each weak labeler over time (right). The vertical lines represent when a change in distribution occurs.
  • Figure 2: We report the accuracy of different fixed-window-size strategies and our adaptive algorithm for the Tennis Rally dataset over time (single run, permute). For each time step $t$, the reported accuracy is an average over the next $128$ time steps. The plot shows that no fixed-window-size strategy is consistently good, while the adaptive strategy consistently matches the best strategy at any given time.
  • Figure 3: In the left plot, we report the average accuracy of the dynamically selected window sizes (our algorithm) and different fixed-window-size strategies for the Tennis Rally dataset. In the right plot, we report the histogram of the window sizes chosen by our algorithm. The reported results are for both experimental setups permute and no permute. The average and standard deviations are over $30$ random runs, where the randomness of each run is due to the abstensions, and the shuffling of the weak labelers for the permute setup.
  • Figure 4: (Commercial). In the left plot, we report the average accuracy of the dynamically selected window sizes (our algorithm) and different fixed-window-size strategies for the Commercial Dataset. In the right plot, we report the histogram of the window sizes chosen by our algorithm.
  • Figure 5: (Basketball). We report the same results of \ref{['figure:accuracy-window-size-commercial']} but for the Basketball dataset.
  • ...and 9 more figures

Theorems & Definitions (24)

  • Proposition 4.2: Lemma 9 of bonald2017minimax
  • Lemma 5.1
  • Theorem 5.2: Main Result
  • Theorem B.1
  • Proposition B.2
  • proof
  • Proposition B.3
  • proof
  • Proposition B.4
  • proof
  • ...and 14 more