Table of Contents
Fetching ...

P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training

Yingxuan Yang, Huayi Wang, Muning Wen, Xiaoyun Mo, Qiuying Peng, Jun Wang, Weinan Zhang

TL;DR

This paper introduces P3, an adaptive framework aimed at optimizing the task-specific fine-tuning process through iterative data pruning, and validate P3 on the reasoning scenarios, APPS and MATH, demonstrating significant improvements over traditional data pruning methods.

Abstract

In the rapidly advancing field of Large Language Models (LLMs), effectively leveraging existing datasets during fine-tuning to maximize the model's potential is of paramount importance. This paper introduces P3, an adaptive framework aimed at optimizing the task-specific fine-tuning process through iterative data pruning. P3 consists of three key components: (1) Policy-driven Difficulty Measurement, which dynamically assesses data difficulty based on the model's real-time performance, replacing static metrics with adaptable evaluations; (2) Pace-Adaptive Selection, leveraging self-paced learning to progressively introduce more challenging data, thereby enhancing model capability; (3) Diversity Promotion, incorporating Determinantal Point Process (DPP) to ensure data diversity across epochs, enriching the learning process. We validate P3 on the reasoning scenarios, APPS and MATH, demonstrating significant improvements over traditional data pruning methods. By advancing dynamic data selection and utilization strategies, P3 contributes both a theoretical framework and concrete approach to fully exploit existing data for LLMs' performance improvement, offering utility across diverse tasks.

P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training

TL;DR

This paper introduces P3, an adaptive framework aimed at optimizing the task-specific fine-tuning process through iterative data pruning, and validate P3 on the reasoning scenarios, APPS and MATH, demonstrating significant improvements over traditional data pruning methods.

Abstract

In the rapidly advancing field of Large Language Models (LLMs), effectively leveraging existing datasets during fine-tuning to maximize the model's potential is of paramount importance. This paper introduces P3, an adaptive framework aimed at optimizing the task-specific fine-tuning process through iterative data pruning. P3 consists of three key components: (1) Policy-driven Difficulty Measurement, which dynamically assesses data difficulty based on the model's real-time performance, replacing static metrics with adaptable evaluations; (2) Pace-Adaptive Selection, leveraging self-paced learning to progressively introduce more challenging data, thereby enhancing model capability; (3) Diversity Promotion, incorporating Determinantal Point Process (DPP) to ensure data diversity across epochs, enriching the learning process. We validate P3 on the reasoning scenarios, APPS and MATH, demonstrating significant improvements over traditional data pruning methods. By advancing dynamic data selection and utilization strategies, P3 contributes both a theoretical framework and concrete approach to fully exploit existing data for LLMs' performance improvement, offering utility across diverse tasks.
Paper Structure (28 sections, 10 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 28 sections, 10 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Training Data Analysis Across Epochs. The left graph shows the difficulty distribution of data selected in different epochs, illustrating our progressive training strategy from easier to more challenging tasks. The right graph, using t-SNE visualization, displays the uniform and diverse distribution of selected data.
  • Figure 2: Illustration of the P3 Framework, which embodies a Policy-driven, Pace-adaptive, and Diversity-Promoted approach to optimize Large Language Model training through three strategic stages.
  • Figure 3: APPS dataset comparison over 5 epochs. Test accuracy and BLEU score show that our P3 method outperforms other baselines with faster learning and stronger results.
  • Figure 4: Effect of data size on acc over 5 epochs.
  • Figure 5: Effect of epochs on acc with 300 samples.