Modularization is Better: Effective Code Generation with Modular Prompting

Ruwei Pan; Hongyu Zhang

Modularization is Better: Effective Code Generation with Modular Prompting

Ruwei Pan, Hongyu Zhang

TL;DR

MoT tackles the challenge of complex programming tasks by structuring reasoning with a Multi-Level Reasoning Graph to separate concerns and align thought with code. It introduces a two-phase process: MLR Graph Generation to decompose the problem, and Modular Code Generation that implements components following the graph. Across six code-generation benchmarks and two advanced LLMs, MoT outperforms baselines such as CoT, SCoT, Self-planning, and CodeCoT, with substantial gains in Pass@1 and AvgPassRatio. The approach promises improved code quality, maintainability, and potential cost efficiency for LLM-assisted software development.

Abstract

Large Language Models are transforming software development by automatically generating code. Current prompting techniques such as Chain-of-Thought (CoT) suggest tasks step by step and the reasoning process follows a linear structure, which hampers the understanding of complex programming problems, particularly those requiring hierarchical solutions. Inspired by the principle of modularization in software development, in this work, we propose a novel prompting technique, called MoT, to enhance the code generation performance of LLMs. At first, MoT exploits modularization principles to decompose complex programming problems into smaller, independent reasoning steps, enabling a more structured and interpretable problem-solving process. This hierarchical structure improves the LLM's ability to comprehend complex programming problems. Then, it structures the reasoning process using an MLR Graph (Multi-Level Reasoning Graph), which hierarchically organizes reasoning steps. This approach enhances modular understanding and ensures better alignment between reasoning steps and the generated code, significantly improving code generation performance. Our experiments on two advanced LLMs (GPT-4o-mini and DeepSeek-R1), comparing MoT to six baseline prompting techniques across six widely used datasets, HumanEval, HumanEval-ET, HumanEval+, MBPP, MBPP-ET, and MBPP+, demonstrate that MoT significantly outperforms existing baselines (e.g., CoT and SCoT), achieving Pass@1 scores ranging from 58.1% to 95.1%. The experimental results confirm that MoT significantly enhances the performance of LLM-based code generation.

Modularization is Better: Effective Code Generation with Modular Prompting

TL;DR

Abstract

Modularization is Better: Effective Code Generation with Modular Prompting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)