Dependency-aware Maximum Likelihood Estimation for Active Learning
Beyza Kalkanli, Tales Imbiriba, Stratis Ioannidis, Deniz Erdogmus, Jennifer Dy
TL;DR
This paper addresses the mismatch between sequential active learning and the i.i.d. assumption of traditional maximum likelihood estimation by introducing Dependency-aware Maximum Likelihood Estimation (DMLE). DMLE augments the standard likelihood with a dependency term that captures how previously acquired samples influence future selections, operationalized via the sampling distribution P(S_τ|D_{τ-1}; θ) and approximated using stochastic batch methods. The authors prove that DMLE increases Fisher information and reduces estimator variance, and they demonstrate substantial early-cycle accuracy gains across a wide range of datasets, acquisition functions, and batch sizes, with Wilcoxon tests confirming statistical significance in the majority of trials. This work suggests significant practical benefits for active learning by aligning parameter estimation with the inherently dependent data-generation process, potentially improving labeling efficiency in real-world settings.
Abstract
Active learning aims to efficiently build a labeled training set by strategically selecting samples to query labels from annotators. In this sequential process, each sample acquisition influences subsequent selections, causing dependencies among samples in the labeled set. However, these dependencies are overlooked during the model parameter estimation stage when updating the model using Maximum Likelihood Estimation (MLE), a conventional method that assumes independent and identically distributed (i.i.d.) data. We propose Dependency-aware MLE (DMLE), which corrects MLE within the active learning framework by addressing sample dependencies typically neglected due to the i.i.d. assumption, ensuring consistency with active learning principles in the model parameter estimation process. This improved method achieves superior performance across multiple benchmark datasets, reaching higher performance in earlier cycles compared to conventional MLE. Specifically, we observe average accuracy improvements of 6%, 8.6%, and 10.5% for k=1, k=5, and k=10 respectively, after collecting the first 100 samples, where entropy is the acquisition function and k is the query batch size acquired at every active learning cycle. Our implementation is publicly available at: https://github.com/neu-spiral/DMLEforAL
