Table of Contents
Fetching ...

LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning

Yansheng Mao, Yufei Xu, Jiaqi Li, Fanxu Meng, Haotong Yang, Zilong Zheng, Xiyuan Wang, Muhan Zhang

TL;DR

This work tackles the challenge of long-context understanding in large language models by proposing Long Input Fine-Tuning (LIFT), which memorizes long inputs in model parameters through segmented overlapping training, auxiliary QA tasks, contextualized training, and a Gated Memory adapter. The method combines a segmentation-based memorization objective with task-aligned auxiliary supervision and a parameter-efficient attention adaptor to balance long-input memorization with in-context learning. Empirical results on LooGLE and LongBench show significant improvements over truncated in-context learning across multiple base models, along with notable efficiency advantages during generation. While promising, the approach reveals trade-offs in ICL and occasional degradation on some tasks, highlighting avenues for future work in distillation, task design, and broader corpora to strengthen parametric memorization without sacrificing generalization.

Abstract

Long context understanding remains challenging for large language models due to their limited context windows. This paper presents Long Input Fine-Tuning (LIFT), a novel framework for long-context modeling that can improve the long-context performance of arbitrary (short-context) LLMs by dynamically adapting model parameters based on the long input. Importantly, LIFT, rather than endlessly extending the context window size to accommodate increasingly longer inputs in context, chooses to store and absorb the long input in parameter. By fine-tuning the long input into model parameters, LIFT allows short-context LLMs to answer questions even when the required information is not provided in the context during inference. Furthermore, to enhance LIFT performance while maintaining the original in-context learning (ICL) capabilities, we introduce Gated Memory, a specialized attention adapter that automatically balances long input memorization and ICL. We provide a comprehensive analysis of the strengths and limitations of LIFT on long context understanding, offering valuable directions for future research.

LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning

TL;DR

This work tackles the challenge of long-context understanding in large language models by proposing Long Input Fine-Tuning (LIFT), which memorizes long inputs in model parameters through segmented overlapping training, auxiliary QA tasks, contextualized training, and a Gated Memory adapter. The method combines a segmentation-based memorization objective with task-aligned auxiliary supervision and a parameter-efficient attention adaptor to balance long-input memorization with in-context learning. Empirical results on LooGLE and LongBench show significant improvements over truncated in-context learning across multiple base models, along with notable efficiency advantages during generation. While promising, the approach reveals trade-offs in ICL and occasional degradation on some tasks, highlighting avenues for future work in distillation, task design, and broader corpora to strengthen parametric memorization without sacrificing generalization.

Abstract

Long context understanding remains challenging for large language models due to their limited context windows. This paper presents Long Input Fine-Tuning (LIFT), a novel framework for long-context modeling that can improve the long-context performance of arbitrary (short-context) LLMs by dynamically adapting model parameters based on the long input. Importantly, LIFT, rather than endlessly extending the context window size to accommodate increasingly longer inputs in context, chooses to store and absorb the long input in parameter. By fine-tuning the long input into model parameters, LIFT allows short-context LLMs to answer questions even when the required information is not provided in the context during inference. Furthermore, to enhance LIFT performance while maintaining the original in-context learning (ICL) capabilities, we introduce Gated Memory, a specialized attention adapter that automatically balances long input memorization and ICL. We provide a comprehensive analysis of the strengths and limitations of LIFT on long context understanding, offering valuable directions for future research.

Paper Structure

This paper contains 26 sections, 10 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: An overview of LIFT compared with existing methods.
  • Figure 2: Comparison between our segmentation method and the trivial segmentation method.
  • Figure 3: The architecture of Gated Memory. The purple part is the added adapter "gated memory" to fit the out-of-context attention; the green part is the original attention module. During training, only the gated memory part is trained.
  • Figure 4: Subfigures (a)-(c) illustrate the decoding speed comparison between LIFT and ICL given inputs of length 20K, 50K, and 100K.
  • Figure 5: Performance on NIAH: ICL (top) vs. LIFT (bottom).
  • ...and 1 more figures