Table of Contents
Fetching ...

Dual-Criterion Curriculum Learning: Application to Temporal Data

Gaspard Abel, Eloi Campagne, Mohamed Benloughmari, Argyris Kalogeratos

Abstract

Curriculum Learning (CL) is a meta-learning paradigm that trains a model by feeding the data instances incrementally according to a schedule, which is based on difficulty progression. Defining meaningful difficulty assessment measures is crucial and most usually the main bottleneck for effective learning, while also in many cases the employed heuristics are only application-specific. In this work, we propose the Dual-Criterion Curriculum Learning (DCCL) framework that combines two views of assessing instance-wise difficulty: a loss-based criterion is complemented by a density-based criterion learned in the data representation space. Essentially, DCCL calibrates training-based evidence (loss) under the consideration that data sparseness amplifies the learning difficulty. As a testbed, we choose the time-series forecasting task. We evaluate our framework on multivariate time-series benchmarks under standard One-Pass and Baby-Steps training schedules. Empirical results show the interest of density-based and hybrid dual-criterion curricula over loss-only baselines and standard non-CL training in this setting.

Dual-Criterion Curriculum Learning: Application to Temporal Data

Abstract

Curriculum Learning (CL) is a meta-learning paradigm that trains a model by feeding the data instances incrementally according to a schedule, which is based on difficulty progression. Defining meaningful difficulty assessment measures is crucial and most usually the main bottleneck for effective learning, while also in many cases the employed heuristics are only application-specific. In this work, we propose the Dual-Criterion Curriculum Learning (DCCL) framework that combines two views of assessing instance-wise difficulty: a loss-based criterion is complemented by a density-based criterion learned in the data representation space. Essentially, DCCL calibrates training-based evidence (loss) under the consideration that data sparseness amplifies the learning difficulty. As a testbed, we choose the time-series forecasting task. We evaluate our framework on multivariate time-series benchmarks under standard One-Pass and Baby-Steps training schedules. Empirical results show the interest of density-based and hybrid dual-criterion curricula over loss-only baselines and standard non-CL training in this setting.
Paper Structure (18 sections, 5 equations, 6 figures, 3 tables)

This paper contains 18 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Schema of the proposed modular DCCL framework. A representation model $\phi_{\boldsymbol{\theta}}$ maps each data instance $\boldsymbol{x}\xspace_n\in\altmathcal{D}$ to a vector $\phi_{\boldsymbol{\theta}}(\boldsymbol{x}\xspace_n)\in\mathbb{R}^d$. Based on these representations, a difficulty module$\delta$ (dashed block) assigns a score to each instance. These scores are then partitioned into $K$ ordered curriculum buckets $\altmathcal{B}_1\preceq\cdots\preceq\altmathcal{B}_K$ (easy to hard) via Adaptive Filtering. Next, the model $f_{\boldsymbol{\theta}}$ is fine-tuned sequentially on each training set $\altmathcal{C}_k$, given the Training Scheduler.
  • Figure 2: Scatter plot of loss vs. density scores for training instances. The complementarity of these two difficulty criteria motivates hybrid strategies that combine both views.
  • Figure 3: Effect of the convex-mixing parameter $\alpha$ on curriculum buckets. As $\alpha$ varies from $0$ (pure density-based) to $1$ (pure loss-based), the resulting difficulty ordering smoothly interpolates between the strategies.
  • Figure 4: Bivariate loss-density stratification. Instances are binned into a 2D grid according to their loss and density scores, then ordered cell-by-cell from easy (low loss, high density) to hard (high loss, low density). Each cell is taken as a bucket.
  • Figure 5: Percentage improvement over the No-curriculum baseline. Top: The results concern each strategy (rows) and dataset/schedule combination (columns). Green cells indicate improvement; red cells indicate degradation. Bottom: Radar plots of percentage improvement over No-curriculum per strategy, separately for the One-Pass (left) and Baby-Steps (right) schedules. Each axis corresponds to a dataset; polygons further from the black reference circle (0% improvement) indicate stronger gains.
  • ...and 1 more figures