Bootstrap Your Own Context Length

Liang Wang; Nan Yang; Xingxing Zhang; Xiaolong Huang; Furu Wei

Bootstrap Your Own Context Length

Liang Wang, Nan Yang, Xingxing Zhang, Xiaolong Huang, Furu Wei

TL;DR

The paper addresses the challenge of training long-context LLMs without relying on scarce natural long-context data by bootstrapping from short-context capabilities. It introduces a multi-step data-synthesis pipeline driven by an agent workflow, coupled with progressive context-length training to transfer short-context skills to long-context tasks. Experiments with open-source Llama-3 models show the approach can reach up to one million tokens with competitive performance on various benchmarks, including the RULER suite and needle-in-haystack tasks. The work demonstrates the viability of data-centric strategies to unlock practical long-context reasoning, while also outlining avenues for efficiency and architectural improvements in future research.

Abstract

We introduce a bootstrapping approach to train long-context language models by exploiting their short-context capabilities only. Our method utilizes a simple agent workflow to synthesize diverse long-context instruction tuning data, thereby eliminating the necessity for manual data collection and annotation. The proposed data synthesis workflow requires only a short-context language model, a text retriever, and a document collection, all of which are readily accessible within the open-source ecosystem. Subsequently, language models are fine-tuned using the synthesized data to extend their context lengths. In this manner, we effectively transfer the short-context capabilities of language models to long-context scenarios through a bootstrapping process. We conduct experiments with the open-source Llama-3 family of models and demonstrate that our method can successfully extend the context length to up to 1M tokens, achieving superior performance across various benchmarks.

Bootstrap Your Own Context Length

TL;DR

Abstract

Bootstrap Your Own Context Length

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)