Table of Contents
Fetching ...

Contextual Active Model Selection

Xuefeng Liu, Fangfang Xia, Rick L. Stevens, Yuxin Chen

TL;DR

This work introduces CAMS, a Contextual Active Model Selection framework that jointly leverages context-aware online model selection and an adaptive query strategy to minimize labeling costs while selecting among pre-trained classifiers. CAMS operates under both stochastic and adversarial data streams, providing regret bounds and sublinear query-complexity guarantees, and unifies contextual bandits, online learning, and active learning in a streaming setting. Theoretical results show constant pseudo-regret in the stochastic regime and sublinear regret in the adversarial regime, complemented by practical demonstrations of substantial label-efficiency (e.g., <10% labeling on CIFAR10 and DRIFT) across diverse benchmarks including CIFAR10, VERTEBRAL, HIV, and ImageNet. Empirically, CAMS outperforms a wide range of baselines, adapts to heterogeneous policy sets, and remains robust to malicious experts, offering a scalable solution for context-driven model selection with limited labeling resources in real-world deployments.

Abstract

While training models and labeling data are resource-intensive, a wealth of pre-trained models and unlabeled data exists. To effectively utilize these resources, we present an approach to actively select pre-trained models while minimizing labeling costs. We frame this as an online contextual active model selection problem: At each round, the learner receives an unlabeled data point as a context. The objective is to adaptively select the best model to make a prediction while limiting label requests. To tackle this problem, we propose CAMS, a contextual active model selection algorithm that relies on two novel components: (1) a contextual model selection mechanism, which leverages context information to make informed decisions about which model is likely to perform best for a given context, and (2) an active query component, which strategically chooses when to request labels for data points, minimizing the overall labeling cost. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Furthermore, we demonstrate the effectiveness of our algorithm on a diverse collection of benchmark classification tasks. Notably, CAMS requires substantially less labeling effort (less than 10%) compared to existing methods on CIFAR10 and DRIFT benchmarks, while achieving similar or better accuracy. Our code is publicly available at: https://github.com/xuefeng-cs/Contextual-Active-Model-Selection.

Contextual Active Model Selection

TL;DR

This work introduces CAMS, a Contextual Active Model Selection framework that jointly leverages context-aware online model selection and an adaptive query strategy to minimize labeling costs while selecting among pre-trained classifiers. CAMS operates under both stochastic and adversarial data streams, providing regret bounds and sublinear query-complexity guarantees, and unifies contextual bandits, online learning, and active learning in a streaming setting. Theoretical results show constant pseudo-regret in the stochastic regime and sublinear regret in the adversarial regime, complemented by practical demonstrations of substantial label-efficiency (e.g., <10% labeling on CIFAR10 and DRIFT) across diverse benchmarks including CIFAR10, VERTEBRAL, HIV, and ImageNet. Empirically, CAMS outperforms a wide range of baselines, adapts to heterogeneous policy sets, and remains robust to malicious experts, offering a scalable solution for context-driven model selection with limited labeling resources in real-world deployments.

Abstract

While training models and labeling data are resource-intensive, a wealth of pre-trained models and unlabeled data exists. To effectively utilize these resources, we present an approach to actively select pre-trained models while minimizing labeling costs. We frame this as an online contextual active model selection problem: At each round, the learner receives an unlabeled data point as a context. The objective is to adaptively select the best model to make a prediction while limiting label requests. To tackle this problem, we propose CAMS, a contextual active model selection algorithm that relies on two novel components: (1) a contextual model selection mechanism, which leverages context information to make informed decisions about which model is likely to perform best for a given context, and (2) an active query component, which strategically chooses when to request labels for data points, minimizing the overall labeling cost. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Furthermore, we demonstrate the effectiveness of our algorithm on a diverse collection of benchmark classification tasks. Notably, CAMS requires substantially less labeling effort (less than 10%) compared to existing methods on CIFAR10 and DRIFT benchmarks, while achieving similar or better accuracy. Our code is publicly available at: https://github.com/xuefeng-cs/Contextual-Active-Model-Selection.
Paper Structure (22 sections, 4 theorems, 4 equations, 4 figures, 2 tables)

This paper contains 22 sections, 4 theorems, 4 equations, 4 figures, 2 tables.

Key Result

Theorem 1

(Regret) In the stochastic environment, with probability at least $1-\delta$, CAMS achieves constant expected pseudo regret $\overline{\mathcal{R}}_T \left( {\textsc{CAMS}\xspace} \right) {=} \left( {\frac{\ln{\frac{|\Pi^*|{-1}}{\gamma}}+{\sqrt{\ln{|\Pi^*|}\cdot{2{b^2}\ln{\frac{2}{\delta}}}}}}{\sq

Figures (4)

  • Figure 1: The Contextual Active Model Selection (CAMS) algorithm
  • Figure 2: Main results. Comparison of CAMS with 7 baselines across 4 diverse benchmarks in terms of cost effectiveness. We plot the cumulative loss as we increase the query cost for a fixed number of rounds $T$ and maximal query cost $B$ (from left to right: $T=10000, 3000, 80, 4000$, and $B=1200,2000,80,2000$). CAMS outperforms all baselines. Algorithms: 4 contextual {Oracle, CQBC, CIWAL, CAMS} and 4 non-contextual baselines {RS, QBC, IWAL, MP} are included (see Section ). 90% confident interval are indicated in shades.
  • Figure 3: Ablation studies. (a) Comparing three query strategies $\left\{\textsc{CAMS}\xspace\textrm{, variance-based, random}\right\}$ under same model selection policy. (b) Comparing the increasing rate of CAMS' query cost over other baselines. (c) Comparing CAMS with MP in context-free environment. (d) Evaluating the performance of CAMS under a pure adversarial setting. (e) Large dataset. (f,g) Adjustable query probability. (h) CAMS outperforms the best single policy. The ablation study (a)-(d) is conducted on CIFAR10. For additional results on other benchmarks, please refer to the supplemental material.
  • Figure 4: Comparison of CAMS with 7 baselines on IMAGENET benchmark in terms of cost effectiveness. We plot the cumulative loss as we increase the query cost for a fixed number of rounds $T$ and maximal query cost $B$ ($T=3000$, and $B=2500$). CAMS outperforms all baselines. Algorithms: 4 contextual {Oracle, CQBC, CIWAL, CAMS} and 4 non-contextual baselines {RS, QBC, IWAL, MP} are included. 90% confident interval are indicated in shades.

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4