Table of Contents
Fetching ...

Data Distribution-based Curriculum Learning

Shonal Chaudhry, Anuraganand Sharma

TL;DR

Data Distribution-based Curriculum Learning (DDCL) uses the inherent data distribution of a dataset to build a curriculum based on the order of samples to improve accuracy and increase the convergence rate, underlining its potential for more efficient training.

Abstract

The order of training samples can have a significant impact on the performance of a classifier. Curriculum learning is a method of ordering training samples from easy to hard. This paper proposes the novel idea of a curriculum learning approach called Data Distribution-based Curriculum Learning (DDCL). DDCL uses the data distribution of a dataset to build a curriculum based on the order of samples. Two types of scoring methods known as DDCL (Density) and DDCL (Point) are used to score training samples thus determining their training order. DDCL (Density) uses the sample density to assign scores while DDCL (Point) utilises the Euclidean distance for scoring. We evaluate the proposed DDCL approach by conducting experiments on multiple datasets using a neural network, support vector machine and random forest classifier. Evaluation results show that the application of DDCL improves the average classification accuracy for all datasets compared to standard evaluation without any curriculum. Moreover, analysis of the error losses for a single training epoch reveals that convergence is faster when using DDCL over the no curriculum method.

Data Distribution-based Curriculum Learning

TL;DR

Data Distribution-based Curriculum Learning (DDCL) uses the inherent data distribution of a dataset to build a curriculum based on the order of samples to improve accuracy and increase the convergence rate, underlining its potential for more efficient training.

Abstract

The order of training samples can have a significant impact on the performance of a classifier. Curriculum learning is a method of ordering training samples from easy to hard. This paper proposes the novel idea of a curriculum learning approach called Data Distribution-based Curriculum Learning (DDCL). DDCL uses the data distribution of a dataset to build a curriculum based on the order of samples. Two types of scoring methods known as DDCL (Density) and DDCL (Point) are used to score training samples thus determining their training order. DDCL (Density) uses the sample density to assign scores while DDCL (Point) utilises the Euclidean distance for scoring. We evaluate the proposed DDCL approach by conducting experiments on multiple datasets using a neural network, support vector machine and random forest classifier. Evaluation results show that the application of DDCL improves the average classification accuracy for all datasets compared to standard evaluation without any curriculum. Moreover, analysis of the error losses for a single training epoch reveals that convergence is faster when using DDCL over the no curriculum method.
Paper Structure (11 sections, 6 figures, 5 tables, 3 algorithms)

This paper contains 11 sections, 6 figures, 5 tables, 3 algorithms.

Figures (6)

  • Figure 1: Data Distribution-based Curriculum Learning
  • Figure 2: Example of DDCL scoring methods on Haberman's Survival data.
  • Figure 3: Precision-Recall curves for binary classification datasets.
  • Figure 4: Confusion matrix for multi-class classification datasets.
  • Figure 5: Error loss per epoch for neural network classifier.
  • ...and 1 more figures