A Psychology-based Unified Dynamic Framework for Curriculum Learning
Guangyu Meng, Qingkai Zeng, John P. Lalor, Hong Yu
TL;DR
PUDF presents a psychology-based unified dynamic framework for curriculum learning by combining IRT-based artificial crowds (IRT-AC) to label data difficulty with a model-ability guided data scheduler (DDS-MAE). The approach yields globally interpretable difficulty scores and dynamic, data-efficient training that accelerates convergence while improving accuracy across large language models and diverse tasks. Empirical results show PUDF outperforming standard fine-tuning and several state-of-the-art CL methods in both speed and predictive performance, with notable gains on large-scale datasets like AG News and challenging tasks like MedQA. The work demonstrates PUDF’s scalability, theoretical rigor, and potential to generalize to generative tasks, offering a principled path for adaptive curriculum design in NLP and beyond.
Abstract
Directly learning from examples of varying difficulty levels is often challenging for both humans and machine learning models. A more effective strategy involves exposing learners to examples in a progressive order from easy to difficult. Curriculum Learning (CL) has been proposed to implement this strategy in machine learning model training. However, two key challenges persist in CL framework design: defining the difficulty of training data and determining the appropriate amount of data to input at each training step. Drawing inspiration from psychometrics, this paper presents a Psychology-based Unified Dynamic Framework for Curriculum Learning (PUDF). We quantify the difficulty of training data by applying Item Response Theory (IRT) to responses from Artificial Crowds (AC). This theory-driven IRT-AC approach leads to global (i.e., model-independent) and interpretable difficulty values. Leveraging IRT, we propose a training strategy, Dynamic Data Selection via Model Ability Estimation (DDS-MAE), to schedule the appropriate amount of data during model training. Since our difficulty labeling and model ability estimation are based on a consistent theory, namely IRT, their values are comparable within the same scope, potentially leading to aligned training data selection and faster convergence compared to the other CL methods. Experimental results demonstrate that fine-tuning pre-trained large language models with PUDF leads to higher accuracy and faster convergence on a suite of benchmark datasets compared to standard fine-tuning and state-of-the-art CL methods. Ablation studies and downstream analyses further validate the impact of PUDF for CL.
