Data Dependency-Aware Code Generation from Enhanced UML Sequence Diagrams
Wenxin Mao, Zhitao Wang, Long Wang, Sirong Chen, Cuiyun Gao, Luyang Cao, Ziming Liu, Qiming Zhang, Jun Zhou, Zhi Jin
TL;DR
We address the problem of producing reliable code from complex software designs by decoupling data dependencies from control flows using a data dependency inference (DDI) step. The proposed UML2Dep framework enhances UML sequence diagrams with Decision Tables and refined API specifications, couples them with mathematical formalization prompting, and applies reachability-based context pruning to reduce cognitive load on LLMs. Across industrial datasets, UML2Dep achieves an average DDI recall of 89.97% and precision of 95.06% with F1 of 92.33%, and significantly improves downstream code quality, increasing compilation pass rate by 8.83% and full unit test pass rate by 11.66%. The work demonstrates practical value by validating on real-world microservice designs and showing tangible benefits in design validation, code synthesis reliability, and integration into industrial pipelines.
Abstract
Large language models (LLMs) excel at generating code from natural language (NL) descriptions. However, the plain textual descriptions are inherently ambiguous and often fail to capture complex requirements like intricate system behaviors, conditional logic, and architectural constraints; implicit data dependencies in service-oriented architectures are difficult to infer and handle correctly. To bridge this gap, we propose a novel step-by-step code generation framework named UML2Dep by leveraging unambiguous formal specifications of complex requirements. First, we introduce an enhanced Unified Modeling Language (UML) sequence diagram tailored for service-oriented architectures. This diagram extends traditional visual syntax by integrating decision tables and API specifications, explicitly formalizing structural relationships and business logic flows in service interactions to rigorously eliminate linguistic ambiguity. Second, recognizing the critical role of data flow, we introduce a dedicated data dependency inference (DDI) task. DDI systematically constructs an explicit data dependency graph prior to actual code synthesis. To ensure reliability, we formalize DDI as a constrained mathematical reasoning task through novel prompting strategies, aligning with LLMs' excellent mathematical strengths. Additional static parsing and dependency pruning further reduce context complexity and cognitive load associated with intricate specifications, thereby enhancing reasoning accuracy and efficiency.
