A Closer Look at the Limitations of Instruction Tuning

Sreyan Ghosh; Chandra Kiran Reddy Evuru; Sonal Kumar; Ramaneswaran S; Deepali Aneja; Zeyu Jin; Ramani Duraiswami; Dinesh Manocha

A Closer Look at the Limitations of Instruction Tuning

Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Ramaneswaran S, Deepali Aneja, Zeyu Jin, Ramani Duraiswami, Dinesh Manocha

TL;DR

This work critically examines Instruction Tuning as a knowledge amplifier for large language models. It systematically compares LoRA-based fine-tuning (LFT) with full-parameter fine-tuning (SFT) across multiple open-source LLMs and IT datasets, using both human and GPT-4 multi-aspect evaluations plus token-distribution analyses. The key findings are that LFT largely preserves pre-trained knowledge and yields better factuality, while SFT introduces new knowledge at the cost of content quality and increases hallucinations through causal borrowing from IT data; pattern copying from IT data generally harms performance, though simplifying IT responses can mitigate hallucinations. The study concludes that pre-trained knowledge remains the dominant factor and suggests future work in mitigating SFT-induced hallucinations and exploring hybrid approaches that leverage concise IT data with strong pre-trained grounding. Overall, the paper provides important guidance for developing robust, open-domain chat models by highlighting where IT falls short and where it can still be effective.

Abstract

Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed in this paper inspire future work in related directions.

A Closer Look at the Limitations of Instruction Tuning

TL;DR

Abstract

Paper Structure (19 sections, 1 equation, 12 figures, 17 tables)

This paper contains 19 sections, 1 equation, 12 figures, 17 tables.

Introduction
Experimental Setting
IT is (currently) Not a Knowledge Enhancer
Pattern Copying (often) Hurts Performance
Causal Analysis of Hallucinations
Methods to Improve IT are Ineffective
Related Work
Conclusion
Limitations and Future Work
Reproducibility
Acknowledgements
Prompts
Qualitative Examples
Dataset Details
Training
...and 4 more sections

Figures (12)

Figure 1: Token distribution shift after IT. We compare token distributions between base pre-trained models and their IT-ed versions using 3 metrics defined in Section \ref{['sec:IT_not_knowledge']}. We show that (1) Overall, LFT experiences low token distribution shifts, indicating high alignment with pre-trained knowledge. (2) Shifts in SFT are much higher than in LFT. (3) LFT is unaffected by the scale of the IT dataset.
Figure 2: Dataset scaling is ineffective for LFT. We show that with LFT, a model's performance does not significantly improve when the IT dataset is scaled to 52$\times$ or 326$\times$ its original size.
Figure 3: Pre-trained knowledge outperforms new knowledge learned with SFT. We show that LFT with only 1000 samples outperforms SFT on 326$\times$ and 52$\times$ more samples on factuality and usefulness on both an open- (just-eval-instruct 1k) and knowledge-intensive-domains (MedInstruct-test 216). While responses by the LFT model are most aligned with pre-trained knowledge, responses by the SFT models output new knowledge learned from IT.
Figure 4: KL Divergence analysis between the probability distribution of response tokens from fine-tuned models and their pre-trained only counterparts. We plot separately for tokens in the first 5% and the remaining 95% for each sentence in the response. LFT primarily learns to initiate individual sentences in the response, showing a higher distribution shift and, thereby, the introduction of novel tokens predominantly in the initial parts of every sentence in the response. SFT exhibits a more substantial and uniform distribution shift across the full span of the response.
Figure 5: Style Imitation affects response quality. Instructions 3, 4, and 5 illustrate examples of instances where the model, initially responding accurately, proceeds to generate hallucinated content. The suspected cause is style imitation, a process where the model, striving for lengthier, more detailed responses, fabricates information when it lacks sufficient knowledge. This hypothesis is further confirmed by comparing the responses to responses by another model fine-tuned on the simplified version of the same IT dataset. The hallucinations in Instructions 3 and 4 are not invented but are instead drawn from the IT dataset, a subject explored more comprehensively in Section \ref{['sec:causal_hallucinations']}. Moreover, Instruction 1 exemplifies the model's ability to generate an elaborate answer when it has adequate knowledge of the subject, whereas Instruction 2 demonstrates how merely imitating the style can alter the nature of a response to a reasoning task. Every response is also accompanied by Simplified Res., a response from a model fine-tuned on the same IT dataset but with simplified responses (detailed in Section \ref{['sec:pattern_copying']}). Notice how the Simplified Res. is less prone to hallucination by providing a brief response.
...and 7 more figures

A Closer Look at the Limitations of Instruction Tuning

TL;DR

Abstract

A Closer Look at the Limitations of Instruction Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)