Table of Contents
Fetching ...

Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures

Amirkia Rafiei Oskooei, S. Selcan Yukcu, Mehmet Cevheri Bozoglan, Mehmet S. Aktas

TL;DR

This paper tackles bug localization in large-scale multi-repository microservice architectures by reframing the task as NL-to-NL reasoning. It builds a hierarchical NL knowledge base by converting code across repositories into file- and directory-level summaries anchored by per-repository seed contexts, then applies a two-phase search to route bugs to the correct repository and localize the fault top-down within that repository. Empirical results on a large industrial dataset show the approach outperforms strong retrieval baselines and agentic RAG systems in end-to-end localization (Pass@10, MRR) and provides an interpretable, auditable reasoning path (repository→directory→file). The work demonstrates the practicality and trustworthiness of representation engineering for scalable, enterprise-grade AI-powered developer tools, with implications for broader applications beyond bug localization.

Abstract

Bug localization in multi-repository microservice architectures is challenging due to the semantic gap between natural language bug reports and code, LLM context limitations, and the need to first identify the correct repository. We propose reframing this as a natural language reasoning task by transforming codebases into hierarchical NL summaries and performing NL-to-NL search instead of cross-modal retrieval. Our approach builds context-aware summaries at file, directory, and repository levels, then uses a two-phase search: first routing bug reports to relevant repositories, then performing top-down localization within those repositories. Evaluated on DNext, an industrial system with 46 repositories and 1.1M lines of code, our method achieves Pass@10 of 0.82 and MRR of 0.50, significantly outperforming retrieval baselines and agentic RAG systems like GitHub Copilot and Cursor. This work demonstrates that engineered natural language representations can be more effective than raw source code for scalable bug localization, providing an interpretable repository -> directory -> file search path, which is vital for building trust in enterprise AI tools by providing essential transparency.

Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures

TL;DR

This paper tackles bug localization in large-scale multi-repository microservice architectures by reframing the task as NL-to-NL reasoning. It builds a hierarchical NL knowledge base by converting code across repositories into file- and directory-level summaries anchored by per-repository seed contexts, then applies a two-phase search to route bugs to the correct repository and localize the fault top-down within that repository. Empirical results on a large industrial dataset show the approach outperforms strong retrieval baselines and agentic RAG systems in end-to-end localization (Pass@10, MRR) and provides an interpretable, auditable reasoning path (repository→directory→file). The work demonstrates the practicality and trustworthiness of representation engineering for scalable, enterprise-grade AI-powered developer tools, with implications for broader applications beyond bug localization.

Abstract

Bug localization in multi-repository microservice architectures is challenging due to the semantic gap between natural language bug reports and code, LLM context limitations, and the need to first identify the correct repository. We propose reframing this as a natural language reasoning task by transforming codebases into hierarchical NL summaries and performing NL-to-NL search instead of cross-modal retrieval. Our approach builds context-aware summaries at file, directory, and repository levels, then uses a two-phase search: first routing bug reports to relevant repositories, then performing top-down localization within those repositories. Evaluated on DNext, an industrial system with 46 repositories and 1.1M lines of code, our method achieves Pass@10 of 0.82 and MRR of 0.50, significantly outperforming retrieval baselines and agentic RAG systems like GitHub Copilot and Cursor. This work demonstrates that engineered natural language representations can be more effective than raw source code for scalable bug localization, providing an interpretable repository -> directory -> file search path, which is vital for building trust in enterprise AI tools by providing essential transparency.

Paper Structure

This paper contains 31 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The levels of code abstraction. While traditional methods move down from Programming Language (PL) to low-level vector embeddings, our approach moves up, transforming PL into a higher-level Natural Language (NL) representation to enable semantic reasoning with LLMs.
  • Figure 2: The generation process for the repository-level summary. A multimodal pipeline processes various repository documents into a single text format. These attachments, along with the repository tree and source code, are used to prompt an LLM to create a comprehensive summary that serves as the seed context for all downstream tasks.
  • Figure 3: Bottom-up construction of the knowledge tree. Leaf nodes (file-level summaries) are generated first, using the repository-level summary as seed context. These are then aggregated to create intermediate nodes (directory-level summaries), forming a hierarchical NL representation of the codebase.
  • Figure 4: The Search Space Routing process. The LLM compares a bug report against repository-level summaries to identify a ranked list of candidate microservices, focusing the search.
  • Figure 5: The top-down bug localization process. The search is first narrowed to candidate directories (Upper Red Rectangle) before a focused search over file-level summaries finds the exact bug location (Lower Red Rectangle).
  • ...and 4 more figures