Table of Contents
Fetching ...

Bandit Guided Submodular Curriculum for Adaptive Subset Selection

Prateek Chanda, Prayas Agrawal, Saral Sureka, Lokesh Reddy Polu, Atharv Kshirsagar, Ganesh Ramakrishnan

TL;DR

This work reframes adaptive data subset selection for curriculum learning as a multi-armed bandit problem where each arm is a submodular function guiding sample selection. It introduces OnlineSubmod, a no-regret, greedy policy that optimizes a validation-driven reward to adaptively schedule curriculum across training. Theoretical guarantees establish regret bounds and convergence to the best arm, while extensive experiments show superior accuracy-efficiency tradeoffs on vision and language tasks with modest overhead from submodular selection. The approach emphasizes principled, validation-informed curriculum dynamics and demonstrates scalable data-efficient training for large models.

Abstract

Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.

Bandit Guided Submodular Curriculum for Adaptive Subset Selection

TL;DR

This work reframes adaptive data subset selection for curriculum learning as a multi-armed bandit problem where each arm is a submodular function guiding sample selection. It introduces OnlineSubmod, a no-regret, greedy policy that optimizes a validation-driven reward to adaptively schedule curriculum across training. Theoretical guarantees establish regret bounds and convergence to the best arm, while extensive experiments show superior accuracy-efficiency tradeoffs on vision and language tasks with modest overhead from submodular selection. The approach emphasizes principled, validation-informed curriculum dynamics and demonstrates scalable data-efficient training for large models.

Abstract

Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.

Paper Structure

This paper contains 50 sections, 10 theorems, 56 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumptions a - d, for all $t > t_0$, with probability at least the expected instantaneous regret incurred by the arm selection policy satisfies where $\mathfrak{C}_*$ is the approximation constant corresponding to the optimal arm $a^*$.

Figures (9)

  • Figure 1: Sequential Ordering of Submodular Functions: Observations on CIFAR100. Initial training with subsets sampled using representation-based submodular functions followed by diversity performs better than the opposite order.
  • Figure 2: Test perplexity dynamics on LLAMA-2-7B during training with various online batch selection strategies on MMLU. We evaluate on US Foreign Policy, Anatomy, Sociology, and Chemistry. $\textsc{OnlineSubmod}$ significantly outperforms baselines.
  • Figure 3: Samplewise Submodular Curriculum:$\textsc{OnlineSubmod}$ consistently achieves top-1 accuracy across all subset sizes on TinyImageNet, SVHN, CIFAR-10, and CIFAR-100, and remains competitive on MNIST. Notably, it matches or outperforms all baselines at early subset fractions (10%, 30%) on all datasets except MNIST.
  • Figure 4: Evolution of Term I and Term II (Eq \ref{['eq:SecondApprox']}) across training epochs on CIFAR-100.
  • Figure 5: Arm selection distribution over epochs on CIFAR-100. Diversity based submodular functions become increasingly active during training.
  • ...and 4 more figures

Theorems & Definitions (30)

  • Definition 1
  • Definition 2: Submodularity
  • Definition 3: Monotonicity
  • Definition 4: Maximum High Value Subset
  • Theorem 1: Regret Guarantees
  • Lemma 1: Permutation Invariance of Expected Marginal Gain
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • ...and 20 more