A RAG Method for Source Code Inquiry Tailored to Long-Context LLMs
Toshihiro Kamiya
TL;DR
The paper tackles the challenge of long source-code inquiries under LLM context limits by proposing a Retrieval-Augmented Generation approach that derives an execution trace to build a call tree and extract relevant function source code. This information is incorporated as documents in the prompt, enabling the LLM to answer questions about a software product without loading its entire codebase. Experiments on the rich-cli OSS project using long-context LLMs show a consistent trend that including the call tree and ordered source code improves answer quality, though extremely large prompts can strain context-length limits. The work demonstrates a practical path for applying LLMs to complex software tasks and highlights design choices in prompt construction, with future directions toward automated prompt generation and broader task coverage.
Abstract
Although the context length limitation of large language models (LLMs) has been mitigated, it still hinders their application to software development tasks. This study proposes a method incorporating execution traces into RAG for inquiries about source code. Small-scale experiments confirm a tendency for the method to contribute to improving LLM response quality.
