Multi-CoLoR: Context-Aware Localization and Reasoning across Multi-Language Codebases
Indira Vats, Sanjukta De, Subhayan Roy, Saurabh Bodhe, Lejin Varghese, Max Kiehn, Yonas Bedasso, Marsha Chechik
TL;DR
Multi-CoLoR tackles the problem of code localization in large, multi-language repositories by coupling organizational memory with graph-based reasoning. It introduces Similar Issue Context (SIC) to prune search space using historical issues and LocAgent-X to perform cross-language graph-guided localization (C++, QML, Python) via a Unified Dependency Graph. Empirical results on industrial AMD data show that SIC improves search-space efficiency and, when combined with graph-based LocAgent-X, yields the highest Acc@5 across languages while reducing tool calls. The approach is modular, scalable, and designed to integrate into end-to-end repair pipelines, enabling robust localization in real-world industrial software ecosystems.
Abstract
Large language models demonstrate strong capabilities in code generation but struggle to navigate complex, multi-language repositories to locate relevant code. Effective code localization requires understanding both organizational context (e.g., historical issue-fix patterns) and structural relationships within heterogeneous codebases. Existing methods either (i) focus narrowly on single-language benchmarks, (ii) retrieve code across languages via shallow textual similarity, or (iii) assume no prior context. We present Multi-CoLoR, a framework for Context-aware Localization and Reasoning across Multi-Language codebases, which integrates organizational knowledge retrieval with graph-based reasoning to traverse complex software ecosystems. Multi-CoLoR operates in two stages: (i) a similar issue context (SIC) module retrieves semantically and organizationally related historical issues to prune the search space, and (ii) a code graph traversal agent (an extended version of LocAgent, a state-of-the-art localization framework) performs structural reasoning within C++ and QML codebases. Evaluations on a real-world enterprise dataset show that incorporating SIC reduces the search space and improves localization accuracy, and graph-based reasoning generalizes effectively beyond Python-only repositories. Combined, Multi-CoLoR improves Acc@5 over both lexical and graph-based baselines while reducing tool calls on an AMD codebase.
