Table of Contents
Fetching ...

KCoEvo: A Knowledge Graph Augmented Framework for Evolutionary Code Generation

Jiazhen Kang, Yuchen Lu, Chen Jiang, Jinrui Liu, Tianhao Zhang, Bo Jiang, Ningyuan Sun, Tongtong Wu, Guilin Qi

TL;DR

This work proposes a knowledge graph-augmented framework that decomposes the migration task into two synergistic stages: evolution path retrieval and path-informed code generation, enabling structured reasoning over API evolution.

Abstract

Code evolution is inevitable in modern software development. Changes to third-party APIs frequently break existing code and complicate maintenance, posing practical challenges for developers. While large language models (LLMs) have shown promise in code generation, they struggle to reason without a structured representation of these evolving relationships, often leading them to produce outdated APIs or invalid outputs. In this work, we propose a knowledge graph-augmented framework that decomposes the migration task into two synergistic stages: evolution path retrieval and path-informed code generation. Our approach constructs static and dynamic API graphs to model intra-version structures and cross-version transitions, enabling structured reasoning over API evolution. Both modules are trained with synthetic supervision automatically derived from real-world API diffs, ensuring scalability and minimal human effort. Extensive experiments across single-package and multi-package benchmarks demonstrate that our framework significantly improves migration accuracy, controllability, and execution success over standard LLM baselines. The source code and datasets are available at: https://github.com/kangjz1203/KCoEvo.

KCoEvo: A Knowledge Graph Augmented Framework for Evolutionary Code Generation

TL;DR

This work proposes a knowledge graph-augmented framework that decomposes the migration task into two synergistic stages: evolution path retrieval and path-informed code generation, enabling structured reasoning over API evolution.

Abstract

Code evolution is inevitable in modern software development. Changes to third-party APIs frequently break existing code and complicate maintenance, posing practical challenges for developers. While large language models (LLMs) have shown promise in code generation, they struggle to reason without a structured representation of these evolving relationships, often leading them to produce outdated APIs or invalid outputs. In this work, we propose a knowledge graph-augmented framework that decomposes the migration task into two synergistic stages: evolution path retrieval and path-informed code generation. Our approach constructs static and dynamic API graphs to model intra-version structures and cross-version transitions, enabling structured reasoning over API evolution. Both modules are trained with synthetic supervision automatically derived from real-world API diffs, ensuring scalability and minimal human effort. Extensive experiments across single-package and multi-package benchmarks demonstrate that our framework significantly improves migration accuracy, controllability, and execution success over standard LLM baselines. The source code and datasets are available at: https://github.com/kangjz1203/KCoEvo.
Paper Structure (26 sections, 4 equations, 4 figures, 4 tables)

This paper contains 26 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of API evolution in PyTorch across three update stages. The figure depicts the progressive modernization of PyTorch’s inference pipeline from v0.3.x (2017) to v2.0.x (2023), covering the transition from Deprecated APIs to Minor and Major Updates.The lower section highlights major industry adopters (e.g., Amazon, Microsoft, Baidu, TikTok, Google, and VSCode), with data samples partially collected from their open-source GitHub repositories, emphasizing the practical relevance of version-aware code migration in large-scale production ecosystems.
  • Figure 2: Overview of the proposed framework for evolutionary code generation.
  • Figure 3: Lightweight schema of the version-aware knowledge graph. Blue edges represent intra-version relations constructed offline; red edges indicate cross-version transitions established dynamically at runtime. Dashed lines, associated with 'Renaming' or 'Relocation', indicate more complex transformations where an entity evolves through significant modifications to its core attributes or contract.
  • Figure 4: Representative error types observed in version-aware code migration. The examples illustrate typical reasoning failures encountered by LLMs, including (1) Token Presence Error, (2) Code Validity Error, (3) Parameter Count Error and (4) Parameter Value Error. These cases highlight the limitations of surface-level pattern learning and motivate the need for structured, knowledge-driven reasoning in API evolution tasks.