HuCurl: Human-induced Curriculum Discovery

Mohamed Elgaar; Hadi Amiri

HuCurl: Human-induced Curriculum Discovery

Mohamed Elgaar, Hadi Amiri

TL;DR

This work introduces the problem of curriculum discovery and describes a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty that can discover curricula that outperform them across several NLP tasks.

Abstract

We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to monotonic curricula in existing literature, (ii): the prevailing easy-to-hard or hard-to-easy transition curricula are often at the risk of underperforming, and (iii): the curricula discovered for smaller datasets and models perform well on larger datasets and models respectively. The proposed framework encompasses some of the existing curriculum learning approaches and can discover curricula that outperform them across several NLP tasks.

HuCurl: Human-induced Curriculum Discovery

TL;DR

Abstract

Paper Structure (25 sections, 7 equations, 9 figures, 4 tables)

This paper contains 25 sections, 7 equations, 9 figures, 4 tables.

Introduction
Related Work
Curriculum Discovery Framework
Monotonic Curricula
Non-monotonic Curricula
Parameter Optimization
Prior Knowledge of Difficulty
Experiments
Datasets
Baselines
No-CL
Self-paced Learning (SPL)
Mentornet
Difficulty Prediction (DP)
SuperLoss (SL)
...and 10 more sections

Figures (9)

Figure 1: The model defines a difficulty score based on prior knowledge about sample difficulty and assigns samples to $k$ difficulty groups before training, e.g., easy, medium, and hard for $k=3$. A curriculum is defined for each difficulty group, which dynamically weights sample losses according to their difficulty groups. Each curriculum is defined by a pair of parameters $(r, s)$ that will be optimized to discover an optimized curriculum based on sample difficulty and model behavior.
Figure 2: Generalized logistic functions for curriculum discovery. (a) shows the effect of the rate and shift parameters, $(r,s)$ in (\ref{['eq:glf']}), shown in the legend respectively. (b) is a specific parameter configuration for a curriculum that first introduces easier samples to a model, and then medium and hard samples as training progresses.
Figure 3: Distributions of entropy and loss in our datasets. Samples of the easy class are to the left of the first vertical line and shaded in green, those of the medium class are between the two vertical lines and shaded in orange, and samples of the hard class are to the right of the second line and shaded in red.
Figure 4: Each caption is composed of the first character of the name of a dataset: {ChaosNLI, SNLI, Twitter, Reddit}, followed by the type of the dataset {Difficulty-balanced or Full}, and the difficulty score used {Entropy, Loss} in experiments. The x-axis is the training progress and y-axis is the confidence assigned to samples of a difficulty-class. The green line (circle marker) is easy, orange line (x marker) is medium, and red line (diamond marker) is hard. The solid line is the mean of the top 25 performing configurations for each dataset and scoring function pair, and the shaded area represents the 95% CI.
Figure 5: Notation is the same as Figure \ref{['fig:configs']}: {ChaosNLI, SNLI, Twitter, Reddit}, followed by the type of the dataset {Difficulty-balanced or Full}, and the difficulty score used {Entropy, Loss}. The x-axis lists curricula discovered using a particular dataset and scoring function, and the increasing curriculum inc (Figure \ref{['fig:inc_cfg']}). The y-axis lists models that are trained using each curriculum. For example, the cell at the intersection of row "S-F-L" and column "T-F-E" represents a model trained on SNLI full partitioned by loss, using the curriculum discovered for the full Twitter dataset partitioned by entropy (Figure \ref{['fig:afe_cfg']}). Each row of the Table is normalized to match the scales of different models (after normalization, the max of each row is 100).
...and 4 more figures

HuCurl: Human-induced Curriculum Discovery

TL;DR

Abstract

HuCurl: Human-induced Curriculum Discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (9)