DAST: Difficulty-Aware Self-Training on Large Language Models
Boyang Xue, Qi Zhu, Hongru Wang, Rui Wang, Sheng Wang, Hongling Xu, Fei Mi, Yasheng Wang, Lifeng Shang, Qun Liu, Kam-Fai Wong
TL;DR
DAST tackles under-sampling of difficult queries in LLM self-training by introducing a difficulty-aware loop that estimates query difficulty $d_i$, augments training data accordingly, and refines the model via SFT and DPO. It uses a sampling-based estimation with an initial policy $\mathcal{M}_0$ and a difficulty partition across levels $E$, $M$, $H$, and $U$, along with data proportion control and difficulty-matched prompting to adjust response lengths. Empirical results on GSM8K, MATH, TAL-SCQ, College, and TheoremQA show improved math reasoning and generalization, particularly on out-of-domain tasks, with DAST-S and DAST-D outperforming baselines. The findings highlight the importance of explicitly incorporating task difficulty into self-training to achieve data-efficient gains in large language models.
Abstract
Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability. Therefore, this work proposes a difficulty-aware self-training (DAST) framework that focuses on improving both the quantity and quality of self-generated responses on challenging queries during self-training. DAST is specified in three components: 1) sampling-based difficulty level estimation, 2) difficulty-aware data augmentation, and 3) the self-training algorithm using SFT and DPO respectively. Experiments on mathematical tasks demonstrate the effectiveness and generalization of DAST, highlighting the critical role of difficulty-aware strategies in advancing LLM self-training.
