Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact

Hyunji Lee; Seunghyun Yoon; Yunjae Won; Hanseok Oh; Geewook Kim; Trung Bui; Franck Dernoncourt; Elias Stengel-Eskin; Mohit Bansal; Minjoon Seo

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact

Hyunji Lee, Seunghyun Yoon, Yunjae Won, Hanseok Oh, Geewook Kim, Trung Bui, Franck Dernoncourt, Elias Stengel-Eskin, Mohit Bansal, Minjoon Seo

TL;DR

This paper compares instruction-tuning for LLMs trained with external context (Ctx-LLM) versus context-free data (NoCtx-LLM) to understand how context affects knowledge use and downstream performance. It demonstrates that context-augmented training strengthens grounding and reduces hallucinations in vision-language models, while also shifting reliance away from parametric memory toward provided evidence. The authors show that two deployment strategies—training a mixture of both data types or routing inputs between two specialized models—can preserve complementary strengths and yield robust performance across diverse tasks. These insights inform practical data and system design for real-world applications where context availability varies. Overall, context-aware instruction tuning enhances grounding and leads to practical, scalable deployment options for both text and vision-language tasks.

Abstract

Instruction tuning is a widely used approach to improve the instruction-following ability of large language models (LLMs). Instruction-tuning datasets typically include a mixture of context-augmented and context-free examples, yet prior work has largely combined these data types without examining their distinct effects. In this paper, we investigate how training LLMs with or without context affects model behavior and downstream performance. First, in the text domain, we show that LLMs trained with context attend more strongly to the provided knowledge, achieving better grounding. We also observe that context-augmented training shifts how LLMs use knowledge: models store and leverage less on parametric knowledge and instead depend more on the provided context. Second, we observe that using LLM trained with context-augmented data as the backbone for vision-language models reduces hallucination and improves grounding in the visual domain. Finally, we explore practical strategies for real-world deployments where context availability varies. We show that maintaining separate context-augmented and context-free models and routing inputs between them yields more robust overall performance than training a single mixed model, as it better preserves their complementary strengths.

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact

TL;DR

Abstract

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)