Table of Contents
Fetching ...

Understanding Codebase like a Professional! Human-AI Collaboration for Code Comprehension

Jie Gao, Yue Xue, Xiaofei Xie, SoeMin Thant, Erika Lee, Bowen Xu

TL;DR

The paper addresses the challenge of understanding unfamiliar, large codebases by highlighting the limitations of current LLM-based tools in providing adaptive, structured guidance. It derives a hierarchical, global-to-local-to-detailed understanding flow from professional code auditors and formulates design opportunities for an LLM-powered codebase understanding system. The authors implement CodeMap, a prototype that extracts, decomposes, and visualizes codebase information across layers with interactive layer-switching and RAG-backed responses. A user study with nine experienced and six novice developers shows CodeMap increases perceived intuitiveness and usefulness, reduces reliance on LLM outputs by 79%, and boosts map usage time by 90%, with notable benefits for novices and larger projects. This work demonstrates a path toward human-AI co-understanding in software comprehension and has practical implications for onboarding, education, and scalable code understanding.

Abstract

Understanding an unfamiliar codebase is an essential task for developers in various scenarios, such as during the onboarding process. Existing studies have shown that LLMs often fail to support users in understanding code structures or to provide user-centered, adaptive, and dynamic assistance in real-world settings. To address this, we propose learning from the perspective of a unique role, code auditors, whose work often requires them to quickly familiarize themselves with new code projects on weekly or even daily basis. To achieve this, we recruited and interviewed 8 code auditing practitioners to understand how they master codebase understanding. We identified several design opportunities for an LLM-based codebase understanding system: supporting cognitive alignment through automated codebase information extraction, decomposition, and representation, as well as reducing manual effort and conversational distraction through interaction design. To validate them, we designed a prototype, CodeMap, that provides dynamic information extraction and representation aligned with the human cognitive flow and enables interactive switching among hierarchical codebase visualizations. To evaluate the usefulness of our system, we conducted a user study with nine experienced developers and six novice developers. Our results demonstrate that CodeMap improved users' perceived intuitiveness, ease of use, and usefulness in supporting code comprehension, while reducing their reliance on reading and interpreting LLM responses by 79% and increasing map usage time by 90% compared with the static visualization analysis tool. It also enhances novice developers' perceived understanding and reduces their unpurposeful exploration.

Understanding Codebase like a Professional! Human-AI Collaboration for Code Comprehension

TL;DR

The paper addresses the challenge of understanding unfamiliar, large codebases by highlighting the limitations of current LLM-based tools in providing adaptive, structured guidance. It derives a hierarchical, global-to-local-to-detailed understanding flow from professional code auditors and formulates design opportunities for an LLM-powered codebase understanding system. The authors implement CodeMap, a prototype that extracts, decomposes, and visualizes codebase information across layers with interactive layer-switching and RAG-backed responses. A user study with nine experienced and six novice developers shows CodeMap increases perceived intuitiveness and usefulness, reduces reliance on LLM outputs by 79%, and boosts map usage time by 90%, with notable benefits for novices and larger projects. This work demonstrates a path toward human-AI co-understanding in software comprehension and has practical implications for onboarding, education, and scalable code understanding.

Abstract

Understanding an unfamiliar codebase is an essential task for developers in various scenarios, such as during the onboarding process. Existing studies have shown that LLMs often fail to support users in understanding code structures or to provide user-centered, adaptive, and dynamic assistance in real-world settings. To address this, we propose learning from the perspective of a unique role, code auditors, whose work often requires them to quickly familiarize themselves with new code projects on weekly or even daily basis. To achieve this, we recruited and interviewed 8 code auditing practitioners to understand how they master codebase understanding. We identified several design opportunities for an LLM-based codebase understanding system: supporting cognitive alignment through automated codebase information extraction, decomposition, and representation, as well as reducing manual effort and conversational distraction through interaction design. To validate them, we designed a prototype, CodeMap, that provides dynamic information extraction and representation aligned with the human cognitive flow and enables interactive switching among hierarchical codebase visualizations. To evaluate the usefulness of our system, we conducted a user study with nine experienced developers and six novice developers. Our results demonstrate that CodeMap improved users' perceived intuitiveness, ease of use, and usefulness in supporting code comprehension, while reducing their reliance on reading and interpreting LLM responses by 79% and increasing map usage time by 90% compared with the static visualization analysis tool. It also enhances novice developers' perceived understanding and reduces their unpurposeful exploration.

Paper Structure

This paper contains 29 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Methodology overview.
  • Figure 2: Human-LLM co-understanding codebase. LLM-powered codebase understanding system could provide effective assistance in human cognitive understanding by offering improved information extraction, representation, and decomposition.
  • Figure 3: Interface for an example business component map. To support codebase reading, CodeMap allows users to start with 1) Global Understanding through the Global Business Component Map, which provides natural language–enhanced guidance on key modules, components, files, functions and their relationships. Then, users can 2) zoom into a specific module (e.g., “Security Module”) to view textual instructions in the right pane that describe the module’s details and its relationships with other modules.
  • Figure 4: An illustration of how the user uses the system.
  • Figure 5: CodeMap Implementation
  • ...and 3 more figures