Table of Contents
Fetching ...

LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning

Yansheng Mao, Jiaqi Li, Fanxu Meng, Jing Xiong, Zilong Zheng, Muhan Zhang

TL;DR

LIFT presents an on-the-fly, in-parameter fine-tuning framework to enhance long-context understanding for short-context models by training on overlapping input segments and optionally incorporating auxiliary QA tasks and pre-LIFT supervised fine-tuning. The approach achieves memory-efficient, linearly scaling processing of long inputs and, when combined with in-context learning, yields substantial gains on benchmarks like LooGLE and LongBench, especially for models with smaller context windows. Key findings show that LIFT improves certain long-dependency tasks and can outperform standard ICL on several long-context tasks, though effectiveness is task-dependent and AT alone may not always help. The work highlights practical directions for future research, including improved auxiliary-task design, better integration with retrieval, and strategies to exploit the parametric knowledge learned during LIFT for downstream tasks.

Abstract

Long context understanding remains challenging for large language models due to their limited context windows. This paper introduces Long Input Fine-Tuning (LIFT) for long context modeling, a novel framework that enhances LLM performance on long-context tasks by adapting model parameters to the context at test time. LIFT enables efficient processing of lengthy inputs without the computational burden of offline long-context adaptation, and can improve the long-context capabilities of arbitrary short-context models. The framework is further enhanced by integrating in-context learning and pre-LIFT supervised fine-tuning. The combination of in-context learning and LIFT enables short-context models like Llama 3 to handle arbitrarily long contexts and consistently improves their performance on popular long-context benchmarks like LooGLE and LongBench. We also provide a comprehensive analysis of the strengths and limitations of LIFT on long context understanding, offering valuable directions for future research.

LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning

TL;DR

LIFT presents an on-the-fly, in-parameter fine-tuning framework to enhance long-context understanding for short-context models by training on overlapping input segments and optionally incorporating auxiliary QA tasks and pre-LIFT supervised fine-tuning. The approach achieves memory-efficient, linearly scaling processing of long inputs and, when combined with in-context learning, yields substantial gains on benchmarks like LooGLE and LongBench, especially for models with smaller context windows. Key findings show that LIFT improves certain long-dependency tasks and can outperform standard ICL on several long-context tasks, though effectiveness is task-dependent and AT alone may not always help. The work highlights practical directions for future research, including improved auxiliary-task design, better integration with retrieval, and strategies to exploit the parametric knowledge learned during LIFT for downstream tasks.

Abstract

Long context understanding remains challenging for large language models due to their limited context windows. This paper introduces Long Input Fine-Tuning (LIFT) for long context modeling, a novel framework that enhances LLM performance on long-context tasks by adapting model parameters to the context at test time. LIFT enables efficient processing of lengthy inputs without the computational burden of offline long-context adaptation, and can improve the long-context capabilities of arbitrary short-context models. The framework is further enhanced by integrating in-context learning and pre-LIFT supervised fine-tuning. The combination of in-context learning and LIFT enables short-context models like Llama 3 to handle arbitrarily long contexts and consistently improves their performance on popular long-context benchmarks like LooGLE and LongBench. We also provide a comprehensive analysis of the strengths and limitations of LIFT on long context understanding, offering valuable directions for future research.

Paper Structure

This paper contains 32 sections, 6 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Comparison between our segmentation method and the trivial segmentation method.
  • Figure 2: An overview of our method compared with existing methods like truncation, RAG, and long context adaptation.
  • Figure 3: GPU time vs. input length for LIFT and ICL. The dashed lines represent the fitted curves, showing linear growth for LIFT and quadratic growth for ICL. The red cross indicates the input length at which ICL runs out of memory.
  • Figure 4: Performance on NIAH: ICL (top) vs. LIFT+ICL (bottom).