FIRAL: An Active Learning Algorithm for Multinomial Logistic Regression
Youguang Chen, George Biros
TL;DR
FIRAL addresses pool-based active learning for multiclass classification with multinomial logistic regression by tying excess-risk control to the Fisher Information Ratio (FIR) between the unlabeled distribution and a chosen sampling distribution. It introduces a two-stage FIR-focused method: first a convex relaxation to minimize FIR, then a regret-minimization (FTRL) rounding to select the actual labeled points, with provable $(1+ ext{ε})$-approximation guarantees. Theoretical results provide finite-sample upper and lower bounds on the excess risk in terms of FIR under sub-Gaussian assumptions, complemented by bounded-domain analyses. Empirical results on MNIST, CIFAR-10, and ImageNet-50 show that FIRAL consistently outperforms several baselines, particularly in low-sample regimes, underscoring its practical impact for efficient multiclass active learning.
Abstract
We investigate theory and algorithms for pool-based active learning for multiclass classification using multinomial logistic regression. Using finite sample analysis, we prove that the Fisher Information Ratio (FIR) lower and upper bounds the excess risk. Based on our theoretical analysis, we propose an active learning algorithm that employs regret minimization to minimize the FIR. To verify our derived excess risk bounds, we conduct experiments on synthetic datasets. Furthermore, we compare FIRAL with five other methods and found that our scheme outperforms them: it consistently produces the smallest classification error in the multiclass logistic regression setting, as demonstrated through experiments on MNIST, CIFAR-10, and 50-class ImageNet.
