Table of Contents
Fetching ...

SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction

Wuyang Zhang, Chenkai Zhang, Zhen Luo, Jianming Ma, Wangming Yuan, Chuqiao Gu, Chenwei Feng

TL;DR

SemanticForge tackles repository-scale code generation by replacing surface-level retrieval with explicit, queryable semantic representations of code bases. It introduces a dual static-dynamic knowledge graph, a neural planner that generates structured graph queries, an SMT-integrated beam search that enforces constraints during generation, and an incremental maintenance system with provable optimality guarantees. Across RepoKG-50, it achieves marked reductions in schematic and logical hallucinations while delivering substantial gains in functional correctness, and it demonstrates scalable performance with real-time maintenance. The approach offers a practical, end-to-end framework for reliable, semantically-aware code synthesis that can adapt to evolving codebases and diverse programming domains, with potential to broaden automated software development capabilities. Overall, SemanticForge provides both theoretical foundations and empirical evidence that explicit semantic representations plus constraint-aware generation can meaningfully improve repository-level code generation in real-world settings.

Abstract

Large language models (LLMs) have transformed software development by enabling automated code generation, yet they frequently suffer from systematic errors that limit practical deployment. We identify two critical failure modes: \textit{logical hallucination} (incorrect control/data-flow reasoning) and \textit{schematic hallucination} (type mismatches, signature violations, and architectural inconsistencies). These errors stem from the absence of explicit, queryable representations of repository-wide semantics. This paper presents \textbf{SemanticForge}, which introduces four fundamental algorithmic advances for semantically-aware code generation: (1) a novel automatic reconciliation algorithm for dual static-dynamic knowledge graphs, unifying compile-time and runtime program semantics; (2) a neural approach that learns to generate structured graph queries from natural language, achieving 73\% precision versus 51\% for traditional retrieval; (3) a novel beam search algorithm with integrated SMT solving, enabling real-time constraint verification during generation rather than post-hoc validation; and (4) an incremental maintenance algorithm that updates knowledge graphs in $O(|ΔR| \cdot \log n)$ time while maintaining semantic equivalence.

SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction

TL;DR

SemanticForge tackles repository-scale code generation by replacing surface-level retrieval with explicit, queryable semantic representations of code bases. It introduces a dual static-dynamic knowledge graph, a neural planner that generates structured graph queries, an SMT-integrated beam search that enforces constraints during generation, and an incremental maintenance system with provable optimality guarantees. Across RepoKG-50, it achieves marked reductions in schematic and logical hallucinations while delivering substantial gains in functional correctness, and it demonstrates scalable performance with real-time maintenance. The approach offers a practical, end-to-end framework for reliable, semantically-aware code synthesis that can adapt to evolving codebases and diverse programming domains, with potential to broaden automated software development capabilities. Overall, SemanticForge provides both theoretical foundations and empirical evidence that explicit semantic representations plus constraint-aware generation can meaningfully improve repository-level code generation in real-world settings.

Abstract

Large language models (LLMs) have transformed software development by enabling automated code generation, yet they frequently suffer from systematic errors that limit practical deployment. We identify two critical failure modes: \textit{logical hallucination} (incorrect control/data-flow reasoning) and \textit{schematic hallucination} (type mismatches, signature violations, and architectural inconsistencies). These errors stem from the absence of explicit, queryable representations of repository-wide semantics. This paper presents \textbf{SemanticForge}, which introduces four fundamental algorithmic advances for semantically-aware code generation: (1) a novel automatic reconciliation algorithm for dual static-dynamic knowledge graphs, unifying compile-time and runtime program semantics; (2) a neural approach that learns to generate structured graph queries from natural language, achieving 73\% precision versus 51\% for traditional retrieval; (3) a novel beam search algorithm with integrated SMT solving, enabling real-time constraint verification during generation rather than post-hoc validation; and (4) an incremental maintenance algorithm that updates knowledge graphs in time while maintaining semantic equivalence.

Paper Structure

This paper contains 241 sections, 18 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Complete architecture of the SemanticForge system. The pipeline consists of four integrated stages: (I) Repository Knowledge Graph Construction combining static analysis and dynamic traces, (II) Neural Query Planner that transforms instructions into graph queries, (III) Schematic-Constraint Decoder ensuring semantic correctness, and (IV) Continual Maintenance Agent for incremental updates. Each stage addresses specific aspects of the repository-level code generation problem while maintaining overall system coherence.
  • Figure 2: Dual static-dynamic knowledge graph construction with automatic reconciliation.
  • Figure 3: Neural query planner transforming instructions to graph queries via learned generation.
  • Figure 4: SMT-integrated beam search pruning invalid paths during generation.
  • Figure 5: Incremental knowledge graph maintenance performance on RepoKG-50. (a) Update latency scales with change size, achieving 90% reduction versus full reconstruction for typical commits. (b) Component-wise breakdown reveals re-extraction as the dominant cost. (c) Memory usage remains proportional to change size, enabling workstation deployment. (d) Real-world repositories show consistent 66x average speedup.
  • ...and 2 more figures