Contextual Active Model Selection
Xuefeng Liu, Fangfang Xia, Rick L. Stevens, Yuxin Chen
TL;DR
This work introduces CAMS, a Contextual Active Model Selection framework that jointly leverages context-aware online model selection and an adaptive query strategy to minimize labeling costs while selecting among pre-trained classifiers. CAMS operates under both stochastic and adversarial data streams, providing regret bounds and sublinear query-complexity guarantees, and unifies contextual bandits, online learning, and active learning in a streaming setting. Theoretical results show constant pseudo-regret in the stochastic regime and sublinear regret in the adversarial regime, complemented by practical demonstrations of substantial label-efficiency (e.g., <10% labeling on CIFAR10 and DRIFT) across diverse benchmarks including CIFAR10, VERTEBRAL, HIV, and ImageNet. Empirically, CAMS outperforms a wide range of baselines, adapts to heterogeneous policy sets, and remains robust to malicious experts, offering a scalable solution for context-driven model selection with limited labeling resources in real-world deployments.
Abstract
While training models and labeling data are resource-intensive, a wealth of pre-trained models and unlabeled data exists. To effectively utilize these resources, we present an approach to actively select pre-trained models while minimizing labeling costs. We frame this as an online contextual active model selection problem: At each round, the learner receives an unlabeled data point as a context. The objective is to adaptively select the best model to make a prediction while limiting label requests. To tackle this problem, we propose CAMS, a contextual active model selection algorithm that relies on two novel components: (1) a contextual model selection mechanism, which leverages context information to make informed decisions about which model is likely to perform best for a given context, and (2) an active query component, which strategically chooses when to request labels for data points, minimizing the overall labeling cost. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Furthermore, we demonstrate the effectiveness of our algorithm on a diverse collection of benchmark classification tasks. Notably, CAMS requires substantially less labeling effort (less than 10%) compared to existing methods on CIFAR10 and DRIFT benchmarks, while achieving similar or better accuracy. Our code is publicly available at: https://github.com/xuefeng-cs/Contextual-Active-Model-Selection.
