Table of Contents
Fetching ...

Human-in-the-Loop Visual Re-ID for Population Size Estimation

Gustavo Perez, Daniel Sheldon, Grant Van Horn, Subhransu Maji

TL;DR

The work tackles estimating population size in large image collections despite imperfect Re-ID by introducing a human-in-the-loop estimator based on nested importance sampling that leverages pairwise similarity to produce unbiased counts $k=|\mathcal{Y}|$ with confidence intervals. It defines proposal distributions derived from approximate similarities and derives an estimator $\widehat{CC}_{N,M}$ that requires only $N\times M$ human queries, with proven asymptotic normality and bias $O(1/M)$. The approach outperforms strong baselines across seven animal datasets, delivering accurate estimates with extremely small human effort (often $<0.002\%$ of all pairs) and calibrated CIs, making it practical for wildlife monitoring and generalized category discovery. The method is deployment-ready on top of any Re-ID system and provides a principled way to quantify uncertainty and guide human labeling.

Abstract

Computer vision-based re-identification (Re-ID) systems are increasingly being deployed for estimating population size in large image collections. However, the estimated size can be significantly inaccurate when the task is challenging or when deployed on data from new distributions. We propose a human-in-the-loop approach for estimating population size driven by a pairwise similarity derived from an off-the-shelf Re-ID system. Our approach, based on nested importance sampling, selects pairs of images for human vetting driven by the pairwise similarity, and produces asymptotically unbiased population size estimates with associated confidence intervals. We perform experiments on various animal Re-ID datasets and demonstrate that our method outperforms strong baselines and active clustering approaches. In many cases, we are able to reduce the error rates of the estimated size from around 80% using CV alone to less than 20% by vetting a fraction (often less than 0.002%) of the total pairs. The cost of vetting reduces with the increase in accuracy and provides a practical approach for population size estimation within a desired tolerance when deploying Re-ID systems.

Human-in-the-Loop Visual Re-ID for Population Size Estimation

TL;DR

The work tackles estimating population size in large image collections despite imperfect Re-ID by introducing a human-in-the-loop estimator based on nested importance sampling that leverages pairwise similarity to produce unbiased counts with confidence intervals. It defines proposal distributions derived from approximate similarities and derives an estimator that requires only human queries, with proven asymptotic normality and bias . The approach outperforms strong baselines across seven animal datasets, delivering accurate estimates with extremely small human effort (often of all pairs) and calibrated CIs, making it practical for wildlife monitoring and generalized category discovery. The method is deployment-ready on top of any Re-ID system and provides a principled way to quantify uncertainty and guide human labeling.

Abstract

Computer vision-based re-identification (Re-ID) systems are increasingly being deployed for estimating population size in large image collections. However, the estimated size can be significantly inaccurate when the task is challenging or when deployed on data from new distributions. We propose a human-in-the-loop approach for estimating population size driven by a pairwise similarity derived from an off-the-shelf Re-ID system. Our approach, based on nested importance sampling, selects pairs of images for human vetting driven by the pairwise similarity, and produces asymptotically unbiased population size estimates with associated confidence intervals. We perform experiments on various animal Re-ID datasets and demonstrate that our method outperforms strong baselines and active clustering approaches. In many cases, we are able to reduce the error rates of the estimated size from around 80% using CV alone to less than 20% by vetting a fraction (often less than 0.002%) of the total pairs. The cost of vetting reduces with the increase in accuracy and provides a practical approach for population size estimation within a desired tolerance when deploying Re-ID systems.
Paper Structure (29 sections, 2 theorems, 12 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 2 theorems, 12 equations, 10 figures, 1 table, 1 algorithm.

Key Result

lemma thmcounterlemma

Consider a graph $G=(V,E)$ with vertices $u,v \in V=\{1,...,n\}$ corresponding to images $x_u$, with an edge $e_{uv} \in E$ between $u$ and $v$ if the images $x_u$ and $x_v$ belong to the same cluster. Let $d(u) = |\{ e_{uv} \in E \}|$ be the degree of vertex $u$. Then the number of clusters $K$ in

Figures (10)

  • Figure 1: Estimating Population Size Using a Re-ID System. (a) A simple approach involves using $k$-means clustering on image embeddings derived from the Re-ID system and selecting the optimal $k$ using the "elbow heuristic." (b) Active clustering (e.g., pck-means basu2004) employs pairwise constraints to enhance clustering accuracy. (c) Our method leverages nested importance sampling to produce asymptotically unbiased estimates and confidence intervals on $k$ directly. (Right) On the MacaqueFaces dataset MacaqueFaces, our approach (Nested-IS) converges to the true $k=34$ with fewer constraints than alternative methods, but also provides confidence intervals for the estimate (shown as the shaded red region) for any amount of human feedback.
  • Figure 2: Counting clusters in a graph. The number of clusters $k = \sum_{u=1}^n 1/(1+d(u))$, where $d(u)$ is degree of node $u$. In this example $k = 4 \times 1/4 + 3 \times 1/3 + 2 \times 1/2 = 3$.
  • Figure 3: Proposed Framework for Counting Clusters in a Dataset.(a) We re-present dataset as a graph $G$ and estimate the number of connected components for an (unknown) pairwise similarity. (b) First, we compute an approximate similarity between images using an embedding. (c) We sample vertices $u_i$ from the distribution $Q(u)$ which biases the samples towards vertices with low (estimated) degrees. (d) We then sample nodes of $v_{i,j}$ from $q_{u_i}(v)$ biased towards neighbors. (e) Human feedback on the sampled pairs is used to estimate the number of clusters with confidence intervals.
  • Figure 4: Performance of Estimating $k$ per Human Effort on Animal Re-ID Datasets. We use the cosine similarity built from the MegaDescriptor-L-384 image embeddings--See § \ref{['sec:evaluation']}. The human effort is measured as the fraction of the sampled pairs and total pairs $|E|$ in the dataset $G$. Our method estimates the true count with less human effort compared to baselines. Dashed lines indicate the mean estimates and shaded regions indicate the mean 95% confidence interval across 100 trials.
  • Figure 5: WhaleSharkID Dataset Statistics. (Left) The dataset is long-tailed with many individuals with a few images. (Center) Histogram of vertex degree distribution--most individuals have less than 5 images. (Right) Sample images from the dataset.
  • ...and 5 more figures

Theorems & Definitions (6)

  • lemma thmcounterlemma
  • proof : Proof of Lemma \ref{['lemma:lemma1']}
  • remark thmcounterremark
  • theorem thmcountertheorem
  • proof : Proof of Theorem \ref{['theorem:variance']}
  • remark thmcounterremark