SRLCG: Self-Rectified Large-Scale Code Generation with Multidimensional Chain-of-Thought and Dynamic Backtracking
Hongru Ma, Yanjie Liang, Jiasheng Si, Weiyu Zhang, Hongjiao Guan, Chaoqun Zheng, Bing Xu, Wenpeng Lu
TL;DR
This paper introduces SRLCG, a framework for generating complete, multi-file software projects from a single prompt, addressing non-expert usability and the brittleness of prior code-generation methods. It combines Multidimensional CoT (strategic, tactical, operational) with a dynamic backtracking engine and adaptive self-rectification to ensure project-level coherence, correctness, and robustness. Across GPT-4 and DeepSeek-V3 backbones, SRLCG produces significantly longer, more complete projects and achieves notable gains in code length, completeness, correctness, usability, and robustness compared with strong CoT baselines. The work demonstrates practical potential for enabling non-programmers to generate robust, scalable software architectures, and points to future work on improving inference efficiency.
Abstract
Large language models (LLMs) have revolutionized code generation, significantly enhancing developer productivity. However, for a vast number of users with minimal coding knowledge, LLMs provide little support, as they primarily generate isolated code snippets rather than complete, large-scale project code. Without coding expertise, these users struggle to interpret, modify, and iteratively refine the outputs of LLMs, making it impossible to assemble a complete project. To address this issue, we propose Self-Rectified Large-Scale Code Generator (SRLCG), a framework that generates complete multi-file project code from a single prompt. SRLCG employs a novel multidimensional chain-of-thought (CoT) and self-rectification to guide LLMs in generating correct and robust code files, then integrates them into a complete and coherent project using our proposed dynamic backtracking algorithm. Experimental results show that SRLCG generates code 15x longer than DeepSeek-V3, 16x longer than GPT-4, and at least 10x longer than other leading CoT-based baselines. Furthermore, they confirm its improved correctness, robustness, and performance compared to baselines in large-scale code generation.
