Bandit Guided Submodular Curriculum for Adaptive Subset Selection

Prateek Chanda; Prayas Agrawal; Saral Sureka; Lokesh Reddy Polu; Atharv Kshirsagar; Ganesh Ramakrishnan

Bandit Guided Submodular Curriculum for Adaptive Subset Selection

Prateek Chanda, Prayas Agrawal, Saral Sureka, Lokesh Reddy Polu, Atharv Kshirsagar, Ganesh Ramakrishnan

TL;DR

This work reframes adaptive data subset selection for curriculum learning as a multi-armed bandit problem where each arm is a submodular function guiding sample selection. It introduces OnlineSubmod, a no-regret, greedy policy that optimizes a validation-driven reward to adaptively schedule curriculum across training. Theoretical guarantees establish regret bounds and convergence to the best arm, while extensive experiments show superior accuracy-efficiency tradeoffs on vision and language tasks with modest overhead from submodular selection. The approach emphasizes principled, validation-informed curriculum dynamics and demonstrates scalable data-efficient training for large models.

Abstract

Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.

Bandit Guided Submodular Curriculum for Adaptive Subset Selection

TL;DR

Abstract

Bandit Guided Submodular Curriculum for Adaptive Subset Selection

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (30)