HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position

Kechi Zhang; Ge Li; Huangzhao Zhang; Zhi Jin

HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position

Kechi Zhang, Ge Li, Huangzhao Zhang, Zhi Jin

TL;DR

HiRoPE introduces a training-free Hierarchical Rotary Position Embedding that encodes code structure via a two-level position vector, enabling exponential context-length extrapolation for code-oriented tasks. By splitting RoPE dimensions across token-level and function/class-level hierarchies and applying a window mechanism, HiRoPE achieves stable improvements across long code language modeling, long-text modeling, and code-symbol understanding without retraining. The method mitigates out-of-distribution issues in position encoding and demonstrates strong performance gains on real-world datasets (CodeParrot, LCC, RepoBench) and a new Code Symbol Understanding task, while maintaining short-sequence performance. These results suggest HiRoPE as a practical, scalable solution for long-context reasoning in code-heavy applications and open avenues for long-structured data modeling in LLMs.

Abstract

Addressing the limitation of context length in large language models for code-related tasks is the primary focus of this paper. Existing LLMs are constrained by their pre-trained context lengths, leading to performance issues in handling long complex code sequences. Inspired by how human programmers navigate code, we introduce Hierarchical Rotary Position Embedding (HiRoPE), a novel approach that enhances the traditional rotary position embedding into a hierarchical format based on the hierarchical structure of source code. HiRoPE offers easy integration into existing LLMs without extra training costs. Our method is extensively evaluated with various LLMs, demonstrating stable performance in tasks such as language modeling and long code completion. We also introduce a new long code understanding task with real-world code projects, in hopes of promoting further development in this code-related field. Theoretically and experimentally, we find that HiRoPE also addresses the out-of-distribution issue in position encoding. Our HiRoPE significantly expands the context length capabilities of LLMs, enabling inference at lengths exponentially greater than the training length.

HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 6 figures, 8 tables)

This paper contains 28 sections, 6 equations, 6 figures, 8 tables.

Introduction
Preliminary
Rotary Position Embedding in Transformer
Hierarchical Position in Source Code
Hierarchical RoPE
Hierarchical format
Window Mechanism
Experiment Setup
Base LLMs
Baselines
Inference Settings
Details of Code Symbol Understanding task
Results and Analyses
Long Code Language Modeling
Long Text Language Modeling
...and 13 more sections

Figures (6)

Figure 1: Illustration of the hierarchical position in source code, such as function-level and token-level positions. We also show a simplified abstract syntax tree of the code in the bottom left corner.
Figure 2: Overview of our HiRoPE. We transfer the existing position encoding method into a hierarchical format (i.e.,, function-level and token-level) and apply it across different dimensions. We also add a window mechanism to ensure performance stability (in this figure we set $L_{window}$ to 3).
Figure 3: Illustration of Code Symbol Understanding task. We use the task prompt to guide models to extract and output all defined function and class names in input code.
Figure 4: Ablation Studies including the settings of the dimension split, the window mechanism, and the high-level segment split strategy.
Figure 5: Performance of Short-ShearedLLaMA on CodeParrot dataset. The training length is set to 128. The results suggest our method has the potential to extrapolate code models at an exponential length.
...and 1 more figures

HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position

TL;DR

Abstract

HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position

Authors

TL;DR

Abstract

Table of Contents

Figures (6)