Table of Contents
Fetching ...

An active learning framework for multi-group mean estimation

Abdellah Aznag, Rachel Cummings, Adam N. Elmachtoub

TL;DR

An algorithm is proposed, Variance-UCB, that sequentially selects groups according to an upper confidence bound on the variance estimate, and a general theoretical framework for providing efficient bounds on learning from any underlying distribution where the variances can be estimated reasonably is provided.

Abstract

We study a fundamental learning problem over multiple groups with unknown data distributions, where an analyst would like to learn the mean of each group. Moreover, we want to ensure that this data is collected in a relatively fair manner such that the noise of the estimate of each group is reasonable. In particular, we focus on settings where data are collected dynamically, which is important in adaptive experimentation for online platforms or adaptive clinical trials for healthcare. In our model, we employ an active learning framework to sequentially collect samples with bandit feedback, observing a sample in each period from the chosen group. After observing a sample, the analyst updates their estimate of the mean and variance of that group and chooses the next group accordingly. The analyst's objective is to dynamically collect samples to minimize the collective noise of the estimators, measured by the norm of the vector of variances of the mean estimators. We propose an algorithm, Variance-UCB, that sequentially selects groups according to an upper confidence bound on the variance estimate. We provide a general theoretical framework for providing efficient bounds on learning from any underlying distribution where the variances can be estimated reasonably. This framework yields upper bounds on regret that improve significantly upon all existing bounds, as well as a collection of new results for different objectives and distributions than those previously studied.

An active learning framework for multi-group mean estimation

TL;DR

An algorithm is proposed, Variance-UCB, that sequentially selects groups according to an upper confidence bound on the variance estimate, and a general theoretical framework for providing efficient bounds on learning from any underlying distribution where the variances can be estimated reasonably is provided.

Abstract

We study a fundamental learning problem over multiple groups with unknown data distributions, where an analyst would like to learn the mean of each group. Moreover, we want to ensure that this data is collected in a relatively fair manner such that the noise of the estimate of each group is reasonable. In particular, we focus on settings where data are collected dynamically, which is important in adaptive experimentation for online platforms or adaptive clinical trials for healthcare. In our model, we employ an active learning framework to sequentially collect samples with bandit feedback, observing a sample in each period from the chosen group. After observing a sample, the analyst updates their estimate of the mean and variance of that group and chooses the next group accordingly. The analyst's objective is to dynamically collect samples to minimize the collective noise of the estimators, measured by the norm of the vector of variances of the mean estimators. We propose an algorithm, Variance-UCB, that sequentially selects groups according to an upper confidence bound on the variance estimate. We provide a general theoretical framework for providing efficient bounds on learning from any underlying distribution where the variances can be estimated reasonably. This framework yields upper bounds on regret that improve significantly upon all existing bounds, as well as a collection of new results for different objectives and distributions than those previously studied.

Paper Structure

This paper contains 32 sections, 37 theorems, 225 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Lemma 2.1

For any norm parameter $p \in [1, +\infty]$, Additionly, $R_p(\boldsymbol{n};\boldsymbol{\sigma}) = R^*_p(\boldsymbol{\sigma})$ uniquely at $\boldsymbol{n}^*_T := T \left(\frac{\sigma_g^{\frac{2p}{p+1}}}{\sum_{h \in [G]}\sigma_h^{\frac{2p}{p+1}}}\right)_{g \in [G]}$.

Figures (2)

  • Figure 1: Illustration of admissible widths for bounding the random gap $\boldsymbol{{\sf UCB}} - \boldsymbol{\sigma}$ (red). The larger function $\boldsymbol{{\sf w}}_1$ (blue) yields valid but loose bounds; the smaller (tighter) function $\boldsymbol{{\sf w}}_2$ (purple) underestimates uncertainty and may violate validity. The ideal width $\boldsymbol{{\sf w}}^*$ (black, dashed) provides a convexified, data-agnostic upper bound that closely tracks the average behavior of the random gap.
  • Figure 2: Main dynamics in bounding $\delta_{\max}$: By using the structure of V-UCB, Lemma \ref{['lemma:potential']} shows that $\delta_{\max}$ is a postfixed point of $F$, meaning that it's either in the green area $\mathcal{C}_{\sf good}$ or the red area $\mathcal{C}_{\sf bad}$. To disqualify the red area, we use an initial upper bound $f_0$ obtained by classic tail-bounding arguments (Lemma \ref{['lemma:initialpoint']}). This improves the upper bound from $f^{\infty}$ to $\sup \mathcal{C}^{\sf good}$, which can be calculated as a limit of the dynamic system described by $F_T$ in Proposition \ref{['prop:improvedrates']}.

Theorems & Definitions (61)

  • Lemma 2.1: Complete information optimal policy
  • Definition 1: UCB-procedure
  • Definition 2: Admissible width
  • Definition 3: decision error
  • Theorem 4.1: Regret bounds for the infinite norm
  • Theorem 4.2: Regret bound for finite $p$-norms
  • Theorem 5.1: Sub-Gaussian feedback
  • Corollary 5.1: Open question in carpentier2011upper
  • proof
  • Theorem 5.2: Gaussian feedback
  • ...and 51 more