Table of Contents
Fetching ...

DocAgent: A Multi-Agent System for Automated Code Documentation Generation

Dayu Yang, Antoine Simoulin, Xin Qian, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Grey Yang

TL;DR

DocAgent tackles the unreliable documentation problem in large and proprietary codebases by introducing a dependency-aware, topologically structured multi-agent system that incrementally builds context. The Navigator computes a dependency-first generation order on a repository's AST-derived DAG, enabling Reader, Searcher, Writer, Verifier, and Orchestrator to collaboratively draft, verify, and refine documentation. The paper also proposes a robust, multi-faceted evaluation framework focusing on Completeness, Helpfulness, and Truthfulness, validated through extensive experiments showing DocAgent outperforms state-of-the-art baselines. The results highlight the practical impact of topological processing and adaptive context management for reliable, scalable automatic code documentation generation, with careful attention to ethics and limitations.

Abstract

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete, unhelpful, or factually incorrect outputs. We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building. Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation. We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness. Comprehensive experiments show DocAgent significantly outperforms baselines consistently. Our ablation study confirms the vital role of the topological processing order. DocAgent offers a robust approach for reliable code documentation generation in complex and proprietary repositories.

DocAgent: A Multi-Agent System for Automated Code Documentation Generation

TL;DR

DocAgent tackles the unreliable documentation problem in large and proprietary codebases by introducing a dependency-aware, topologically structured multi-agent system that incrementally builds context. The Navigator computes a dependency-first generation order on a repository's AST-derived DAG, enabling Reader, Searcher, Writer, Verifier, and Orchestrator to collaboratively draft, verify, and refine documentation. The paper also proposes a robust, multi-faceted evaluation framework focusing on Completeness, Helpfulness, and Truthfulness, validated through extensive experiments showing DocAgent outperforms state-of-the-art baselines. The results highlight the practical impact of topological processing and adaptive context management for reliable, scalable automatic code documentation generation, with careful attention to ethics and limitations.

Abstract

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete, unhelpful, or factually incorrect outputs. We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building. Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation. We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness. Comprehensive experiments show DocAgent significantly outperforms baselines consistently. Our ablation study confirms the vital role of the topological processing order. DocAgent offers a robust approach for reliable code documentation generation in complex and proprietary repositories.

Paper Structure

This paper contains 26 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Architecture of DocAgent: (1) The Navigator Module uses AST parsing for a Dependency DAG and topological traversal. (2) The Multi-Agent framework uses specialized agents (Reader, Searcher, Writer, Verifier) with tools for context-aware documentation generation.
  • Figure 2: Screenshot of DocAgent live code documentation generation page.
  • Figure 3: Multi-facet Evaluation Framework of code documentation, assessing quality along three dimensions: (1) Completeness measures structural adherence to documentation conventions; (2) Helpfulness evaluates practical utility; and (3) Truthfulness verifies factual accuracy.
  • Figure 4: Screenshot of DocAgent Live Evaluation Framework
  • Figure 5: Distribution of repositories by code documentation coverage.
  • ...and 4 more figures