Table of Contents
Fetching ...

Active Measurement: Efficient Estimation at Scale

Max Hamilton, Jinlin Lai, Wenlong Zhao, Subhransu Maji, Daniel Sheldon

TL;DR

Active measurement tackles the challenge of precise scientific counting by integrating AI predictions with iterative human labeling through adaptive importance sampling. The core idea is to maintain an unbiased Monte Carlo estimator of the total measurement while progressively refining the AI predictor and the sampling distribution as labels accrue, with principled weighting, variance estimation, and confidence intervals. The method demonstrably reduces estimation error across diverse domains, including bird counts in radar data and malaria cell counting, and provides calibrated uncertainty better than existing baselines. Practically, this framework enables accurate, scalable measurements with limited labeling effort and robust uncertainty quantification, with potential for broader adoption in remote sensing, microscopy, and ecological monitoring.

Abstract

AI has the potential to transform scientific discovery by analyzing vast datasets with little human effort. However, current workflows often do not provide the accuracy or statistical guarantees that are needed. We introduce active measurement, a human-in-the-loop AI framework for scientific measurement. An AI model is used to predict measurements for individual units, which are then sampled for human labeling using importance sampling. With each new set of human labels, the AI model is improved and an unbiased Monte Carlo estimate of the total measurement is refined. Active measurement can provide precise estimates even with an imperfect AI model, and requires little human effort when the AI model is very accurate. We derive novel estimators, weighting schemes, and confidence intervals, and show that active measurement reduces estimation error compared to alternatives in several measurement tasks.

Active Measurement: Efficient Estimation at Scale

TL;DR

Active measurement tackles the challenge of precise scientific counting by integrating AI predictions with iterative human labeling through adaptive importance sampling. The core idea is to maintain an unbiased Monte Carlo estimator of the total measurement while progressively refining the AI predictor and the sampling distribution as labels accrue, with principled weighting, variance estimation, and confidence intervals. The method demonstrably reduces estimation error across diverse domains, including bird counts in radar data and malaria cell counting, and provides calibrated uncertainty better than existing baselines. Practically, this framework enables accurate, scalable measurements with limited labeling effort and robust uncertainty quantification, with potential for broader adoption in remote sensing, microscopy, and ecological monitoring.

Abstract

AI has the potential to transform scientific discovery by analyzing vast datasets with little human effort. However, current workflows often do not provide the accuracy or statistical guarantees that are needed. We introduce active measurement, a human-in-the-loop AI framework for scientific measurement. An AI model is used to predict measurements for individual units, which are then sampled for human labeling using importance sampling. With each new set of human labels, the AI model is improved and an unbiased Monte Carlo estimate of the total measurement is refined. Active measurement can provide precise estimates even with an imperfect AI model, and requires little human effort when the AI model is very accurate. We derive novel estimators, weighting schemes, and confidence intervals, and show that active measurement reduces estimation error compared to alternatives in several measurement tasks.

Paper Structure

This paper contains 43 sections, 17 theorems, 64 equations, 14 figures, 3 algorithms.

Key Result

Proposition 1

The combined estimator $\hat{F}_{1:t}=\sum_{\tau=1}^t\mathop{\mathrm{\bar{\alpha}}}\nolimits_\tau \hat{F}_\tau$ is unbiased: $\mathop{\mathrm{\mathbb{E}}}\nolimits[\hat{F}_{1:t}]=F(\Omega)$.

Figures (14)

  • Figure 1: Normalized weights as functions of $\tau$ for $t=700$ and $N=1000$.
  • Figure 2: Estimation error on two measurement tasks. Top: Fractional error of the estimated count as the percentage of labeled tiles increases, averaged over 10,000 runs for the counting birds in the "sky" and "reeds" images, and counting roosting birds in KCLE and KGRR radar stations. Bottom: Fractional error of the estimated count after 200 labeled days for the 11 radar stations, using different estimators. The bottom-right table shows that the geometric average fractional error across stations for the raw detector and active measurement across iterations. We see that both the adaptation and the sampling without replacement are beneficial, and quickly outperform the detector and baselines for both tasks.
  • Figure 3: Relative errors compared to $\alpha^{\mathrm{COMB}}$ weighting. Other fixed weighting strategies ($\alpha^{\mathrm{SQRT}}$, $\alpha^{\mathrm{LURE}}$) are worse, but inverse variance weighting (denoted by INV ($\gamma=0.5$)) may achieve lower error.
  • Figure 4: Coverage and radius (relative to the ground-truth) of CIs on the roost data for station KDLH as a function of $t$ (from 5,000 replications), built with either variance estimators from § \ref{['sec:variance']}. The left panel uses the ${\widehat{\text{Var}}}^{\text{cond}}_{1:t}$ estimator. We achieve the desired coverage with narrower CIs.
  • Figure 5: Fractional error compared with other baselines ($\hat{H}$ is motivated by PPI).
  • ...and 9 more figures

Theorems & Definitions (27)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5: informal
  • Proposition 6
  • Proposition 7
  • Proposition 1
  • proof
  • Proposition 2
  • ...and 17 more