Table of Contents
Fetching ...

Compressing Code Context for LLM-based Issue Resolution

Haoxiang Jia, Earl T. Barr, Sergey Mechtaev

Abstract

Large Language Models (LLMs) are now capable of resolving real-world GitHub issues. However, current approaches overapproximate the code context and suffer from two compounding problems: the prohibitive cost of processing massive inputs, and low effectiveness as noise floods the context window and distracts the model from the bug-fixing signal. Existing compression techniques fail to resolve this tension: generic compressors compromise the semantic integrity of code, while code-specific tools lack awareness of code structure and task context to preserve essential patch ingredients. To address this, we propose a novel framework consisting of two components. First, Oracle-guided Code Distillation (OCD), a context distillation algorithm that combines genetic search and delta debugging to systematically reduce code contexts to their minimal sufficient subsequence - retaining only the ingredients required for a successful fix. We use this distilled data to fine-tune SWEzze, a lightweight model that learns to compress code context at inference time, filtering noise and combating distraction while preserving fix ingredients. Evaluated on SWE-bench Verified across three frontier LLMs, SWEzze maintains a stable compression rate of about 6 times across models, reduces the total token budget by 51.8%-71.3% relative to the uncompressed setting, improves issue resolution rates by 5.0%-9.2%, and delivers the best overall balance among effectiveness, compression ratio, and latency compared with state-of-the-art context compression baselines.

Compressing Code Context for LLM-based Issue Resolution

Abstract

Large Language Models (LLMs) are now capable of resolving real-world GitHub issues. However, current approaches overapproximate the code context and suffer from two compounding problems: the prohibitive cost of processing massive inputs, and low effectiveness as noise floods the context window and distracts the model from the bug-fixing signal. Existing compression techniques fail to resolve this tension: generic compressors compromise the semantic integrity of code, while code-specific tools lack awareness of code structure and task context to preserve essential patch ingredients. To address this, we propose a novel framework consisting of two components. First, Oracle-guided Code Distillation (OCD), a context distillation algorithm that combines genetic search and delta debugging to systematically reduce code contexts to their minimal sufficient subsequence - retaining only the ingredients required for a successful fix. We use this distilled data to fine-tune SWEzze, a lightweight model that learns to compress code context at inference time, filtering noise and combating distraction while preserving fix ingredients. Evaluated on SWE-bench Verified across three frontier LLMs, SWEzze maintains a stable compression rate of about 6 times across models, reduces the total token budget by 51.8%-71.3% relative to the uncompressed setting, improves issue resolution rates by 5.0%-9.2%, and delivers the best overall balance among effectiveness, compression ratio, and latency compared with state-of-the-art context compression baselines.

Paper Structure

This paper contains 18 sections, 1 equation, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Agentless + SWEzze generates a correct patch because SWEzze retains the fix ingredients in the compressed context. In contrast, previous compressors (LongCodeZip and SWE-Pruner) degrade Agentless performance by removing these ingredients.
  • Figure 2: A fragment of Agentless' code context for resolving the issue in \ref{['fig:resolution_workflow']}. SWEzze's context has the higher correlation with a minimal sufficient context computed via delta-debugging, and the only one that contains the ingredients to correctly recompute DPI to address the issue.
  • Figure 3: Relevance density vs. context size. Relevance density monotonically decreases with the increase of context size, confirming that broader retrieval introduces proportionally more noise. Error bars denote interquartile ranges.
  • Figure 4: Relevance density per semantic role. Schema segments achieve the highest relevance density (39.2%), while Generic Utility segments represent the strongest noise source (7.2%). The wide variance across roles motivates role-aware sample weighting in SWEzze training.
  • Figure 5: UpSet plots of resolved-instance overlaps across compression methods on SWE-bench Verified. Each panel corresponds to one downstream repair model. Top bars show the size of resolved-instance intersections, while the left bars show the total number of resolved instances per method, with SWEzze highlighted in orange. The plots reveal both the shared wins and the additional instances uniquely recovered by SWEzze.
  • ...and 2 more figures

Theorems & Definitions (1)

  • definition 1: Minimal Sufficient Context