Table of Contents
Fetching ...

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

TL;DR

The paper identifies a fundamental trade-off between instruction following and faithfulness in language model alignment when trained on datasets with conflicting objectives. It demonstrates that traditional multitask learning struggles to reconcile the two goals, and that fine-tuning on one objective can degrade the other. The authors introduce ReSet, a rejection-sampling-based continual self-instruction tuning approach, and show that it substantially improves faithfulness while preserving instruction-following performance, even with a relatively small, high-quality data budget. The work offers a practical pathway to more reliable, grounded, and user-aligned LMs and highlights the importance of data quality and objective alignment in multitask training settings.

Abstract

Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

TL;DR

The paper identifies a fundamental trade-off between instruction following and faithfulness in language model alignment when trained on datasets with conflicting objectives. It demonstrates that traditional multitask learning struggles to reconcile the two goals, and that fine-tuning on one objective can degrade the other. The authors introduce ReSet, a rejection-sampling-based continual self-instruction tuning approach, and show that it substantially improves faithfulness while preserving instruction-following performance, even with a relatively small, high-quality data budget. The work offers a practical pathway to more reliable, grounded, and user-aligned LMs and highlights the importance of data quality and objective alignment in multitask training settings.

Abstract

Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.
Paper Structure (41 sections, 1 equation, 14 figures, 15 tables)

This paper contains 41 sections, 1 equation, 14 figures, 15 tables.

Figures (14)

  • Figure 1: Faithfulness scores on context-dependent tasks (QA and summarization) decrease when we fine-tune grounded LLaMA-7B checkpoint with instruction following datasets (orange), and instruction following scores (assessed by GPT-4) decrease when we fine-tune Vicuna-7B with context-dependent tasks (blue). Our method, ReSet surpasses the vanilla MTL with data mixing, approaching the North Star (upper right corner).
  • Figure 2: Two-stage fine-tuning with LLaMA-7B.
  • Figure 3: Macro-averaged faithfulness, instruction following, and task performance scores on corresponding evaluation datasets before and after fine-tuning with instruction following datasets.
  • Figure 4: Average generation token length throughout the instruction following training stage. The first checkpoint is the best checkpoint from the context-dependent training stage. The middle checkpoint is with the lowest evaluation loss during the second stage.
  • Figure 5: Faithfulness scores for abstraction QA and summarization datasets categorized by whether the generation length is strictly shorter or much longer ($>$ 100 tokens) than the golden answer.
  • ...and 9 more figures