Table of Contents
Fetching ...

Dependency-aware Maximum Likelihood Estimation for Active Learning

Beyza Kalkanli, Tales Imbiriba, Stratis Ioannidis, Deniz Erdogmus, Jennifer Dy

TL;DR

This paper addresses the mismatch between sequential active learning and the i.i.d. assumption of traditional maximum likelihood estimation by introducing Dependency-aware Maximum Likelihood Estimation (DMLE). DMLE augments the standard likelihood with a dependency term that captures how previously acquired samples influence future selections, operationalized via the sampling distribution P(S_τ|D_{τ-1}; θ) and approximated using stochastic batch methods. The authors prove that DMLE increases Fisher information and reduces estimator variance, and they demonstrate substantial early-cycle accuracy gains across a wide range of datasets, acquisition functions, and batch sizes, with Wilcoxon tests confirming statistical significance in the majority of trials. This work suggests significant practical benefits for active learning by aligning parameter estimation with the inherently dependent data-generation process, potentially improving labeling efficiency in real-world settings.

Abstract

Active learning aims to efficiently build a labeled training set by strategically selecting samples to query labels from annotators. In this sequential process, each sample acquisition influences subsequent selections, causing dependencies among samples in the labeled set. However, these dependencies are overlooked during the model parameter estimation stage when updating the model using Maximum Likelihood Estimation (MLE), a conventional method that assumes independent and identically distributed (i.i.d.) data. We propose Dependency-aware MLE (DMLE), which corrects MLE within the active learning framework by addressing sample dependencies typically neglected due to the i.i.d. assumption, ensuring consistency with active learning principles in the model parameter estimation process. This improved method achieves superior performance across multiple benchmark datasets, reaching higher performance in earlier cycles compared to conventional MLE. Specifically, we observe average accuracy improvements of 6%, 8.6%, and 10.5% for k=1, k=5, and k=10 respectively, after collecting the first 100 samples, where entropy is the acquisition function and k is the query batch size acquired at every active learning cycle. Our implementation is publicly available at: https://github.com/neu-spiral/DMLEforAL

Dependency-aware Maximum Likelihood Estimation for Active Learning

TL;DR

This paper addresses the mismatch between sequential active learning and the i.i.d. assumption of traditional maximum likelihood estimation by introducing Dependency-aware Maximum Likelihood Estimation (DMLE). DMLE augments the standard likelihood with a dependency term that captures how previously acquired samples influence future selections, operationalized via the sampling distribution P(S_τ|D_{τ-1}; θ) and approximated using stochastic batch methods. The authors prove that DMLE increases Fisher information and reduces estimator variance, and they demonstrate substantial early-cycle accuracy gains across a wide range of datasets, acquisition functions, and batch sizes, with Wilcoxon tests confirming statistical significance in the majority of trials. This work suggests significant practical benefits for active learning by aligning parameter estimation with the inherently dependent data-generation process, potentially improving labeling efficiency in real-world settings.

Abstract

Active learning aims to efficiently build a labeled training set by strategically selecting samples to query labels from annotators. In this sequential process, each sample acquisition influences subsequent selections, causing dependencies among samples in the labeled set. However, these dependencies are overlooked during the model parameter estimation stage when updating the model using Maximum Likelihood Estimation (MLE), a conventional method that assumes independent and identically distributed (i.i.d.) data. We propose Dependency-aware MLE (DMLE), which corrects MLE within the active learning framework by addressing sample dependencies typically neglected due to the i.i.d. assumption, ensuring consistency with active learning principles in the model parameter estimation process. This improved method achieves superior performance across multiple benchmark datasets, reaching higher performance in earlier cycles compared to conventional MLE. Specifically, we observe average accuracy improvements of 6%, 8.6%, and 10.5% for k=1, k=5, and k=10 respectively, after collecting the first 100 samples, where entropy is the acquisition function and k is the query batch size acquired at every active learning cycle. Our implementation is publicly available at: https://github.com/neu-spiral/DMLEforAL

Paper Structure

This paper contains 26 sections, 2 theorems, 33 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Let $I^{\text{IMLE}}(\theta)$ and $I^{\text{DMLE}}(\theta)$ be the Fisher Information matrices for IMLE and DMLE, respectively. As DMLE accounts for sample dependencies, $I^{\text{DMLE}}(\theta) \succeq I^{\text{IMLE}}(\theta).$ By the Cramér-Rao bound, this implies $\text{Cov}(\hat{\theta}^{\text{D

Figures (10)

  • Figure 1: Comparison of decision boundaries and selected samples across different learning scenarios: On the left, all 700 samples are used for model training without active learning. In the active learning experiments shown in the middle and right, the entropy acquisition function is used for sample selection, with one sample acquired at each cycle, resulting in a total of 140 samples. The middle setup employs IMLE for model parameter updates, while the right setup uses DMLE. The shaded regions represent the decision boundaries, and the yellow crosses in the active learning setups highlight the selected informative samples. Both the model trained with all samples using IMLE and the active learning model updated with DMLE achieve 99.5% accuracy, while the active learning model using IMLE for parameter updates achieves only 92% accuracy.
  • Figure 2: Components of active learning procedure. In active learning, the model selects the samples to be acquired where the model parameters are cyclically updated with the updated labeled set. First, uncertainty scores for the samples in the unlabeled sample pool $U_t$ are calculated using the current model with the acquisition function $a(x,D)$. Next, samples $S_{t+1}$ selected based on the sample selection strategy are labeled by an oracle and included in the labeled set $D_t$, resulting in $D_{t+1}=D_t\cup \{(x,y),x\in S_{t+1}\}$. After the query sampling step, the updated labeled set is used for the model update with a model parameter estimation method, concluding one cycle.
  • Figure 3: The average test accuracy comparison with $\pm 1$ standard deviation for DMLE and IMLE over cycles for Iris, SVHN, Reuters, and Tiny ImageNet datasets for different sample selection strategies, namely Stochastic Softmax Sampling(SSMS), Stochastic Power Sampling(SPS), Stochastic Soft-rank Sampling(SSRS), and Top-$k$ Sampling where sample selection size $k=1$ for all except Tiny ImageNet and $k=5$ for Tiny ImageNet.
  • Figure 4: The experiments comparing DMLE and IMLE using the clustering-based Coreset approach with different sample selection strategies—SSMS, SPS, SSRS, and Top-$k$ Sampling—are presented. The average test accuracy plots with ±1 standard deviation over cycles are shown, with sample selection size $k=5$ for the first row and $k=10$ for the second row. The plots demonstrate that DMLE outperforms IMLE, particularly in the earlier cycles, consistent with our observations using uncertainty-based acquisition functions.
  • Figure 5: The change of the term $\sum_{\tau=1}^{t}ln(Z_\tau)$ in the model parameter estimation objective function through the active learning cycles for the MNIST dataset. One can note that the term changes marginally over the cycles which motivates the elimination of this term from the model parameter estimation while taking into account the computational expenses it introduces into the process.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2