Table of Contents
Fetching ...

SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization

Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, Jiajun Cao, Shihab Rashid, Christian Bock

TL;DR

<3-5 sentence high-level summary>: The paper tackles the challenge of locating semantically relevant code functions within large repositories, where purely semantic retrieval can miss important spatial context. It introduces SpIDER, a graph-aware dense retrieval framework that augments semantic ranking with BFS-based neighborhood exploration over code graphs, guided by LLMs for selective neighbor filtering. Empirical results across Python, Java, JavaScript, and TypeScript demonstrate consistent gains in Recall@K and Acc@K over dense baselines and sparse methods, and the authors provide a new SpIDER-Bench multi-language dataset. The work shows that leveraging code structure and local neighborhoods yields more accurate function-level localization, enabling more reliable automated software engineering workflows and patch generation.

Abstract

Retrieving code units (e.g., files, classes, functions) that are semantically relevant to a given user query, bug report, or feature request from large codebases is a fundamental challenge for LLM-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify relevant units. While embedding-based approaches can outperform BM25 by large margins, they often lack exploration of the codebase and underutilize its underlying graph structure. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that incorporates LLM-based reasoning over auxiliary context obtained through graph-based exploration of the codebase. Empirical results show that SpIDER consistently improves dense retrieval performance across several programming languages.

SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization

TL;DR

<3-5 sentence high-level summary>: The paper tackles the challenge of locating semantically relevant code functions within large repositories, where purely semantic retrieval can miss important spatial context. It introduces SpIDER, a graph-aware dense retrieval framework that augments semantic ranking with BFS-based neighborhood exploration over code graphs, guided by LLMs for selective neighbor filtering. Empirical results across Python, Java, JavaScript, and TypeScript demonstrate consistent gains in Recall@K and Acc@K over dense baselines and sparse methods, and the authors provide a new SpIDER-Bench multi-language dataset. The work shows that leveraging code structure and local neighborhoods yields more accurate function-level localization, enabling more reliable automated software engineering workflows and patch generation.

Abstract

Retrieving code units (e.g., files, classes, functions) that are semantically relevant to a given user query, bug report, or feature request from large codebases is a fundamental challenge for LLM-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify relevant units. While embedding-based approaches can outperform BM25 by large margins, they often lack exploration of the codebase and underutilize its underlying graph structure. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that incorporates LLM-based reasoning over auxiliary context obtained through graph-based exploration of the codebase. Empirical results show that SpIDER consistently improves dense retrieval performance across several programming languages.

Paper Structure

This paper contains 31 sections, 2 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: SpIDER workflow. All functions are first ranked by semantic similarity to the issue description, and the top-$K$ functions are retrieved. From these, the top-$C$ functions (where $C \leq K$) serve as centers for spatial exploration along 'contains' edges. For each center, we use breath first search to explore neighboring functions within $d$ hops, considering only neighbors that also rank within the top-$N$ by semantic similarity. Then an LLM selects the most relevant neighbors for each center. Finally, selected neighbors are inserted immediately below their corresponding centers in the ranked list.
  • Figure 2: Dense retrieval methods performance comparison on SWE-PolyBench benchmark.
  • Figure 3: KDE (Kernel Density Estimate) plots with bootstrapped results of SWE-PolyBench benchmark for Recall@20 performance across various retrieval methods.
  • Figure 4: KDE (Kernel Density Estimate) plots with bootstrapped results of SWE-PolyBench benchmark for Acc@20 performance across various retrieval methods.
  • Figure 5: SpIDER vs DER performance at various values of $K$ on SWE-PolyBench benchmark for SweRankEmbed-Small ZS embedding model, $N=500$, $C=5$, $d=4$.
  • ...and 1 more figures