Table of Contents
Fetching ...

Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

Yongquan He, Wenyuan Zhang, Xuancheng Huang, Peng Zhang, Lingxun Meng, Xiang Zhou, Ke Zeng, Xunliang Cai

TL;DR

This work tackles continual instruction tuning of LLMs by addressing catastrophic forgetting and a half-listening tendency to surface-level instruction patterns. It introduces KPIG, a framework that masks key parts of instructions to compute information gain, guiding selective replay and a dynamic training objective to emphasize task-aware cues. The approach yields state-of-the-art results on seen and held-out tasks, validated with new P-score and V-score metrics that capture instruction-following and generalization, respectively. The methodology offers practical improvements for robust, instruction-aligned LLMs in evolving task environments.

Abstract

Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may bring about the catastrophic forgetting (CF) problem, where previously learned abilities are degraded. Recent methods try to alleviate the CF problem by modifying models or replaying data, which may only remember the surface-level pattern of instructions and get confused on held-out tasks. In this paper, we propose a novel continual instruction tuning method based on Key-part Information Gain (KPIG). Our method computes the information gain on masked parts to dynamically replay data and refine the training objective, which enables LLMs to capture task-aware information relevant to the correct response and alleviate overfitting to general descriptions in instructions. In addition, we propose two metrics, P-score and V-score, to measure the generalization and instruction-following abilities of LLMs. Experiments demonstrate our method achieves superior performance on both seen and held-out tasks.

Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

TL;DR

This work tackles continual instruction tuning of LLMs by addressing catastrophic forgetting and a half-listening tendency to surface-level instruction patterns. It introduces KPIG, a framework that masks key parts of instructions to compute information gain, guiding selective replay and a dynamic training objective to emphasize task-aware cues. The approach yields state-of-the-art results on seen and held-out tasks, validated with new P-score and V-score metrics that capture instruction-following and generalization, respectively. The methodology offers practical improvements for robust, instruction-aligned LLMs in evolving task environments.

Abstract

Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may bring about the catastrophic forgetting (CF) problem, where previously learned abilities are degraded. Recent methods try to alleviate the CF problem by modifying models or replaying data, which may only remember the surface-level pattern of instructions and get confused on held-out tasks. In this paper, we propose a novel continual instruction tuning method based on Key-part Information Gain (KPIG). Our method computes the information gain on masked parts to dynamically replay data and refine the training objective, which enables LLMs to capture task-aware information relevant to the correct response and alleviate overfitting to general descriptions in instructions. In addition, we propose two metrics, P-score and V-score, to measure the generalization and instruction-following abilities of LLMs. Experiments demonstrate our method achieves superior performance on both seen and held-out tasks.
Paper Structure (28 sections, 5 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 28 sections, 5 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Task confusion on item classification (IC) after training merchant classification (MC). Note that IC is a held-out task for evaluation, and LLM at $t$ generates more illegal categories defined in MC ($36.4\% \rightarrow 49.6\%$) as their instructions are similar.
  • Figure 2: The continual instruction tuning framework of our KPIG. In the instruction diversity stage, we require GPT-4 to pay more attention to key parts during the rewriting process. In the information gain fine-tuning stage, we dynamically replay previous tasks with our learning objective based on IG to alleviate the half-listening problem.
  • Figure 3: An example of the instruction diversity.
  • Figure 4: The changing trends of information gain, loss, P-score, and V-score on Sup-NatInst-ST over steps.
  • Figure 5: The impact of $N$ on model performance.
  • ...and 1 more figures