Dual-Criterion Curriculum Learning: Application to Temporal Data

Gaspard Abel; Eloi Campagne; Mohamed Benloughmari; Argyris Kalogeratos

Dual-Criterion Curriculum Learning: Application to Temporal Data

Gaspard Abel, Eloi Campagne, Mohamed Benloughmari, Argyris Kalogeratos

Abstract

Curriculum Learning (CL) is a meta-learning paradigm that trains a model by feeding the data instances incrementally according to a schedule, which is based on difficulty progression. Defining meaningful difficulty assessment measures is crucial and most usually the main bottleneck for effective learning, while also in many cases the employed heuristics are only application-specific. In this work, we propose the Dual-Criterion Curriculum Learning (DCCL) framework that combines two views of assessing instance-wise difficulty: a loss-based criterion is complemented by a density-based criterion learned in the data representation space. Essentially, DCCL calibrates training-based evidence (loss) under the consideration that data sparseness amplifies the learning difficulty. As a testbed, we choose the time-series forecasting task. We evaluate our framework on multivariate time-series benchmarks under standard One-Pass and Baby-Steps training schedules. Empirical results show the interest of density-based and hybrid dual-criterion curricula over loss-only baselines and standard non-CL training in this setting.

Dual-Criterion Curriculum Learning: Application to Temporal Data

Abstract

Paper Structure (18 sections, 5 equations, 6 figures, 3 tables)

This paper contains 18 sections, 5 equations, 6 figures, 3 tables.

Introduction and Related Work
Preliminaries on Curriculum Learning
A Framework for Curriculum Learning
The DCCL Curriculum Learning Pipeline
Single-Criterion Curriculum Learning
Dual-Criterion Curriculum Learning
Experiments
Datasets
Experimental settings
Stage 1: representation model training and difficulty extraction.
Representation architectures.
Stage 2: curriculum model training.
Final evaluation.
Experimental Results
Insights on the training under CL strategies
...and 3 more sections

Figures (6)

Figure 1: Schema of the proposed modular DCCL framework. A representation model $\phi_{\boldsymbol{\theta}}$ maps each data instance $\boldsymbol{x}\xspace_n\in\altmathcal{D}$ to a vector $\phi_{\boldsymbol{\theta}}(\boldsymbol{x}\xspace_n)\in\mathbb{R}^d$. Based on these representations, a difficulty module$\delta$ (dashed block) assigns a score to each instance. These scores are then partitioned into $K$ ordered curriculum buckets $\altmathcal{B}_1\preceq\cdots\preceq\altmathcal{B}_K$ (easy to hard) via Adaptive Filtering. Next, the model $f_{\boldsymbol{\theta}}$ is fine-tuned sequentially on each training set $\altmathcal{C}_k$, given the Training Scheduler.
Figure 2: Scatter plot of loss vs. density scores for training instances. The complementarity of these two difficulty criteria motivates hybrid strategies that combine both views.
Figure 3: Effect of the convex-mixing parameter $\alpha$ on curriculum buckets. As $\alpha$ varies from $0$ (pure density-based) to $1$ (pure loss-based), the resulting difficulty ordering smoothly interpolates between the strategies.
Figure 4: Bivariate loss-density stratification. Instances are binned into a 2D grid according to their loss and density scores, then ordered cell-by-cell from easy (low loss, high density) to hard (high loss, low density). Each cell is taken as a bucket.
Figure 5: Percentage improvement over the No-curriculum baseline. Top: The results concern each strategy (rows) and dataset/schedule combination (columns). Green cells indicate improvement; red cells indicate degradation. Bottom: Radar plots of percentage improvement over No-curriculum per strategy, separately for the One-Pass (left) and Baby-Steps (right) schedules. Each axis corresponds to a dataset; polygons further from the black reference circle (0% improvement) indicate stronger gains.
...and 1 more figures

Dual-Criterion Curriculum Learning: Application to Temporal Data

Abstract

Dual-Criterion Curriculum Learning: Application to Temporal Data

Authors

Abstract

Table of Contents

Figures (6)