Understanding Codebase like a Professional! Human-AI Collaboration for Code Comprehension
Jie Gao, Yue Xue, Xiaofei Xie, SoeMin Thant, Erika Lee, Bowen Xu
TL;DR
The paper addresses the challenge of understanding unfamiliar, large codebases by highlighting the limitations of current LLM-based tools in providing adaptive, structured guidance. It derives a hierarchical, global-to-local-to-detailed understanding flow from professional code auditors and formulates design opportunities for an LLM-powered codebase understanding system. The authors implement CodeMap, a prototype that extracts, decomposes, and visualizes codebase information across layers with interactive layer-switching and RAG-backed responses. A user study with nine experienced and six novice developers shows CodeMap increases perceived intuitiveness and usefulness, reduces reliance on LLM outputs by 79%, and boosts map usage time by 90%, with notable benefits for novices and larger projects. This work demonstrates a path toward human-AI co-understanding in software comprehension and has practical implications for onboarding, education, and scalable code understanding.
Abstract
Understanding an unfamiliar codebase is an essential task for developers in various scenarios, such as during the onboarding process. Existing studies have shown that LLMs often fail to support users in understanding code structures or to provide user-centered, adaptive, and dynamic assistance in real-world settings. To address this, we propose learning from the perspective of a unique role, code auditors, whose work often requires them to quickly familiarize themselves with new code projects on weekly or even daily basis. To achieve this, we recruited and interviewed 8 code auditing practitioners to understand how they master codebase understanding. We identified several design opportunities for an LLM-based codebase understanding system: supporting cognitive alignment through automated codebase information extraction, decomposition, and representation, as well as reducing manual effort and conversational distraction through interaction design. To validate them, we designed a prototype, CodeMap, that provides dynamic information extraction and representation aligned with the human cognitive flow and enables interactive switching among hierarchical codebase visualizations. To evaluate the usefulness of our system, we conducted a user study with nine experienced developers and six novice developers. Our results demonstrate that CodeMap improved users' perceived intuitiveness, ease of use, and usefulness in supporting code comprehension, while reducing their reliance on reading and interpreting LLM responses by 79% and increasing map usage time by 90% compared with the static visualization analysis tool. It also enhances novice developers' perceived understanding and reduces their unpurposeful exploration.
