Phased Instruction Fine-Tuning for Large Language Models

Wei Pang; Chuan Zhou; Xiao-Hua Zhou; Xiaojie Wang

Phased Instruction Fine-Tuning for Large Language Models

Wei Pang, Chuan Zhou, Xiao-Hua Zhou, Xiaojie Wang

TL;DR

Experiments with Llama-2 7B/13B/70B, Llama3 8/70B and Mistral-7B models using Alpaca data show that Phased IFT significantly outperforms One-off IFT, supporting the progressive alignment hypothesis and providing a simple and efficient way to enhance large language models.

Abstract

Instruction Fine-Tuning enhances pre-trained language models from basic next-word prediction to complex instruction-following. However, existing One-off Instruction Fine-Tuning (One-off IFT) method, applied on a diverse instruction, may not effectively boost models' adherence to instructions due to the simultaneous handling of varying instruction complexities. To improve this, Phased Instruction Fine-Tuning (Phased IFT) is proposed, based on the idea that learning to follow instructions is a gradual process. It assesses instruction difficulty using GPT-4, divides the instruction data into subsets of increasing difficulty, and uptrains the model sequentially on these subsets. Experiments with Llama-2 7B/13B/70B, Llama3 8/70B and Mistral-7B models using Alpaca data show that Phased IFT significantly outperforms One-off IFT, supporting the progressive alignment hypothesis and providing a simple and efficient way to enhance large language models. Codes and datasets from our experiments are freely available at https://github.com/xubuvd/PhasedSFT.

Phased Instruction Fine-Tuning for Large Language Models

TL;DR

Abstract

Paper Structure (16 sections, 1 equation, 8 figures, 6 tables)

This paper contains 16 sections, 1 equation, 8 figures, 6 tables.

Introduction
Related Work
Method
Experiments
Experiment 1: compares the win rates of Phased IFT and One-off IFT
Experiment 2: Comparison of win rate between difficulty-stratified 3-stages and randomly sampled 3-stages
Experiment 3: compares the win rates achieved by uptraining on Alpaca-3-stages across all permutations
Ablation Studies
Conclusion
Broader Impact
Limitations
Appendix
Comparison of percentage distribution of difficulty scores on the Alpaca52K dataset using GPT-3.5 and GPT-4
Histogram and density curve for Alpaca, Alpaca-cleaned and LIMA
ChatGPT-4 for evaluating win rate
...and 1 more sections

Figures (8)

Figure 1: In the context of increasing difficulty multi-stage sub-datasets, the trend of win rate growth for uptraining (Phased IFT) compared to One-off on the original dataset is observed. The gray horizontal line represents the performance baseline of One-off.
Figure 2: Overview of the proposed Phased Instruction Fine-Tuning (Phased IFT).
Figure 3: A prompt to ChatGPT-4 for scoring instruction difficulty.
Figure 4: Histogram and cumulative probability density of difficulty scores for Alpaca 52K dataset.
Figure 5: Histogram and cumulative probability density of difficulty scores for Alpaca-cleaned 52K dataset.
...and 3 more figures

Phased Instruction Fine-Tuning for Large Language Models

TL;DR

Abstract

Phased Instruction Fine-Tuning for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)