The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Saurav Pawar; S. M Towhidul Islam Tonmoy; S M Mehedi Zaman; Vinija Jain; Aman Chadha; Amitava Das

The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Saurav Pawar, S. M Towhidul Islam Tonmoy, S M Mehedi Zaman, Vinija Jain, Aman Chadha, Amitava Das

TL;DR

This survey tackles the context length limitation of Large Language Models by classifying approaches into interpolation and extrapolation, with zero-shot and fine-tuned subfamilies. It catalogs a wide array of techniques, including positional encodings (ALiBi, RoPE, randomized encodings, XPOS), specialized attention (LEX), memory-augmented methods (Landmark Attention, TiM, MemGPT), and window-based or prompt-compression strategies (GrowLength, PoSE, LongLoRA, LongQLoRA, YaRN). The paper synthesizes experimental results, advantages, limitations, and related work, highlighting how these methods enable longer-context processing with varying trade-offs in compute, memory, and compatibility. It also discusses evaluation standards, open challenges, and future directions, calling for standardized benchmarks, interpretability, and efficient training workflows. Overall, the survey provides a structured roadmap of techniques that push LLMs to effectively reason and generate over extended textual contexts, facilitating advances in document understanding, long-form generation, and retrieval-augmented reasoning.

Abstract

The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural Language Processing (NLP), contributing to substantial progress in both text comprehension and generation. However, amidst these advancements, it is noteworthy that LLMs often face a limitation in terms of context length extrapolation. Understanding and extending the context length for LLMs is crucial in enhancing their performance across various NLP applications. In this survey paper, we delve into the multifaceted aspects of exploring why it is essential, and the potential transformations that superior techniques could bring to NLP applications. We study the inherent challenges associated with extending context length and present an organized overview of the existing strategies employed by researchers. Additionally, we discuss the intricacies of evaluating context extension techniques and highlight the open challenges that researchers face in this domain. Furthermore, we explore whether there is a consensus within the research community regarding evaluation standards and identify areas where further agreement is needed. This comprehensive survey aims to serve as a valuable resource for researchers, guiding them through the nuances of context length extension techniques and fostering discussions on future advancements in this evolving field.

The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

TL;DR

Abstract

Paper Structure (100 sections, 4 equations, 17 figures, 1 algorithm)

This paper contains 100 sections, 4 equations, 17 figures, 1 algorithm.

Introduction
Contemporary Techniques
Positional Techniques
Extrapolation
Zero-shot extrapolation
Position encodings
Attention with Linear Biases (ALiBi)
Working of ALiBi.
Experiments.
Advantages.
Related work.
Rotary Position Embedding (RoPE)
Experiments.
Advantages.
Related work.
...and 85 more sections

Figures (17)

Figure 1: Taxonomy for context length extension techniques in LLMs. The figure distinguishes the techniques into interpolation and extrapolation, where they are further classified into zero-shot and fine-tuned branches. Positional encoding, Retrieval, Attention and RoPE based techniques are explored the most in this domain of context length extension.
Figure 2: Implementation of ALiBi press2021train. When calculating attention in a neural network, the figure's method involves adding a fixed bias to each attention score before applying the softmax function. This bias is the same for all attention scores in a specific head. The rest of the computation remains unchanged. The variable 'm' is a constant specific to each attention head and is set without being adjusted during training. This approach works well across different types of text, various models, and different computational resources.
Figure 3: Visualization of RoPE SU2024127063, which employs rotational matrices to capture precise absolute positional information in token sequences. By rotating segments of query and key projection matrices at different speeds, RoPE ensures unique rotations, influencing attention scores. The figure visually explains this innovative approach, emphasizing RoPE's reliance on relative distances for improved token relationship comprehension in self-attention models.
Figure 4: Implementation of Randomized Positional Encodings ruoss2023randomized. When testing a model with longer input sequences, the typical way of adding position information can lead to values that were not seen during training. The figure's solution is to address this issue by assigning a random (or ordered) positional encoding vector that covers the entire range of possible positions during testing to each training example.
Figure 5: Implementation of block-wise Causal Attention, which is trained on short texts similar to regular Transformers, using causal masking. For longer sequences during testing, blockwise causal attention is employed, which efficiently reuses overlapping parts like key and value vectors. sun2022length
...and 12 more figures

The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

TL;DR

Abstract

The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (17)