Statistical curriculum learning: An elimination algorithm achieving an oracle risk
Omer Cohen, Ron Meir, Nir Weinberger
TL;DR
The paper studies statistical curriculum learning in a parametric mean-estimation setting with a target task and multiple source tasks differing in similarity $Q_t$ and noise variance $\sigma_t^2$. It introduces an adaptive, multi-round source-elimination CL algorithm that prunes sources based on estimated similarity and a quantified elimination curve, achieving weak-oracle–level risk after $O( ext{log}T)$ rounds and, in the single-source case, matching the strong-oracle rate. The work presents two minimax lower bounds under localized problem instances, discusses the challenges of constructing homogeneous instance sets, and identifies regimes where the weak oracle is minimax optimal (notably $T\le 2$); it also extends the framework to unknown variances/covariances and supports empirical validation. Overall, the results provide a principled, theoretically grounded approach to curriculum design in statistical learning with multiple sources, highlighting when adaptive sampling yields optimal or near-optimal risk. The findings have implications for transfer/meta-learning and structured CL in high-dimensional parametric settings.
Abstract
We consider a statistical version of curriculum learning (CL) in a parametric prediction setting. The learner is required to estimate a target parameter vector, and can adaptively collect samples from either the target model, or other source models that are similar to the target model, but less noisy. We consider three types of learners, depending on the level of side-information they receive. The first two, referred to as strong/weak-oracle learners, receive high/low degrees of information about the models, and use these to learn. The third, a fully adaptive learner, estimates the target parameter vector without any prior information. In the single source case, we propose an elimination learning method, whose risk matches that of a strong-oracle learner. In the multiple source case, we advocate that the risk of the weak-oracle learner is a realistic benchmark for the risk of adaptive learners. We develop an adaptive multiple elimination-rounds CL algorithm, and characterize instance-dependent conditions for its risk to match that of the weak-oracle learner. We consider instance-dependent minimax lower bounds, and discuss the challenges associated with defining the class of instances for the bound. We derive two minimax lower bounds, and determine the conditions under which the performance weak-oracle learner is minimax optimal.
