Table of Contents
Fetching ...

Multi-CoLoR: Context-Aware Localization and Reasoning across Multi-Language Codebases

Indira Vats, Sanjukta De, Subhayan Roy, Saurabh Bodhe, Lejin Varghese, Max Kiehn, Yonas Bedasso, Marsha Chechik

TL;DR

Multi-CoLoR tackles the problem of code localization in large, multi-language repositories by coupling organizational memory with graph-based reasoning. It introduces Similar Issue Context (SIC) to prune search space using historical issues and LocAgent-X to perform cross-language graph-guided localization (C++, QML, Python) via a Unified Dependency Graph. Empirical results on industrial AMD data show that SIC improves search-space efficiency and, when combined with graph-based LocAgent-X, yields the highest Acc@5 across languages while reducing tool calls. The approach is modular, scalable, and designed to integrate into end-to-end repair pipelines, enabling robust localization in real-world industrial software ecosystems.

Abstract

Large language models demonstrate strong capabilities in code generation but struggle to navigate complex, multi-language repositories to locate relevant code. Effective code localization requires understanding both organizational context (e.g., historical issue-fix patterns) and structural relationships within heterogeneous codebases. Existing methods either (i) focus narrowly on single-language benchmarks, (ii) retrieve code across languages via shallow textual similarity, or (iii) assume no prior context. We present Multi-CoLoR, a framework for Context-aware Localization and Reasoning across Multi-Language codebases, which integrates organizational knowledge retrieval with graph-based reasoning to traverse complex software ecosystems. Multi-CoLoR operates in two stages: (i) a similar issue context (SIC) module retrieves semantically and organizationally related historical issues to prune the search space, and (ii) a code graph traversal agent (an extended version of LocAgent, a state-of-the-art localization framework) performs structural reasoning within C++ and QML codebases. Evaluations on a real-world enterprise dataset show that incorporating SIC reduces the search space and improves localization accuracy, and graph-based reasoning generalizes effectively beyond Python-only repositories. Combined, Multi-CoLoR improves Acc@5 over both lexical and graph-based baselines while reducing tool calls on an AMD codebase.

Multi-CoLoR: Context-Aware Localization and Reasoning across Multi-Language Codebases

TL;DR

Multi-CoLoR tackles the problem of code localization in large, multi-language repositories by coupling organizational memory with graph-based reasoning. It introduces Similar Issue Context (SIC) to prune search space using historical issues and LocAgent-X to perform cross-language graph-guided localization (C++, QML, Python) via a Unified Dependency Graph. Empirical results on industrial AMD data show that SIC improves search-space efficiency and, when combined with graph-based LocAgent-X, yields the highest Acc@5 across languages while reducing tool calls. The approach is modular, scalable, and designed to integrate into end-to-end repair pipelines, enabling robust localization in real-world industrial software ecosystems.

Abstract

Large language models demonstrate strong capabilities in code generation but struggle to navigate complex, multi-language repositories to locate relevant code. Effective code localization requires understanding both organizational context (e.g., historical issue-fix patterns) and structural relationships within heterogeneous codebases. Existing methods either (i) focus narrowly on single-language benchmarks, (ii) retrieve code across languages via shallow textual similarity, or (iii) assume no prior context. We present Multi-CoLoR, a framework for Context-aware Localization and Reasoning across Multi-Language codebases, which integrates organizational knowledge retrieval with graph-based reasoning to traverse complex software ecosystems. Multi-CoLoR operates in two stages: (i) a similar issue context (SIC) module retrieves semantically and organizationally related historical issues to prune the search space, and (ii) a code graph traversal agent (an extended version of LocAgent, a state-of-the-art localization framework) performs structural reasoning within C++ and QML codebases. Evaluations on a real-world enterprise dataset show that incorporating SIC reduces the search space and improves localization accuracy, and graph-based reasoning generalizes effectively beyond Python-only repositories. Combined, Multi-CoLoR improves Acc@5 over both lexical and graph-based baselines while reducing tool calls on an AMD codebase.
Paper Structure (33 sections, 4 figures, 4 tables)

This paper contains 33 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Multi-CoLoR pipeline for context-aware code localization and reasoning in multi-language repositories. ① A new issue is ingested. ② The similar issue module retrieves top-k similar historical issues and artifacts (summaries, components, file paths), producing cues that condition the search. ③ LocAgent-X parses the repository and builds a Unified Dependency Graph from the codebase structure (across QML/C++/Python) to localize the file. ④ The system returns ranked fix locations.
  • Figure 2: Sample issue illustrating a UI defect.
  • Figure 3: Distribution of Code Terms in Issue Descriptions.
  • Figure 4: Algorithm for computing file-level similarity between issue pairs based on hierarchical path structure.