ARC: Compiling Hundreds of Requirement Scenarios into A Runnable Web System
Weiyu Kong, Yun Lin, Xiwen Teoh, Duc-Minh Nguyen, Ruofei Ren, Jiaxin Chang, Haoxu Hu, Haoyu Chen
TL;DR
ARC presents a requirement-centric alternative to stochastic code generation by introducing a graph-based DSL that encodes multi-modal requirements as a directed acyclic graph. It employs a bidirectional, test-driven loop consisting of a top-down architecture construction phase and a bottom-up constrained code generation phase, ensuring strict interface contracts and full traceability from requirements to code. Across six real-world web systems, ARC achieves substantial gains in GUI test pass rates (average improvement ~50.6% over baselines) and demonstrates reliable maintainability via its traceability records and modular interfaces. A user study with 21 novice participants shows that DSL-based requirement drafting is approachable and effective for compiling production-grade repositories, albeit with concerns about compilation time and the need for explicit requirements. Overall, ARC demonstrates that formal requirement compilation can produce maintainable, runnable software and offers a scalable path for large-scale AI-assisted software engineering.
Abstract
Large Language Models (LLMs) have improved programming efficiency, but their performance degrades significantly as requirements scale; when faced with multi-modal documents containing hundreds of scenarios, LLMs often produce incorrect implementations or omit constraints. We propose Agentic Requirement Compilation (ARC), a technique that moves beyond simple code generation to requirement compilation, enabling the creation of runnable web systems directly from multi-modal DSL documents. ARC generates not only source code but also modular designs for UI, API, and database layers, enriched test suites (unit, modular, and integration), and detailed traceability for software maintenance. Our approach employs a bidirectional test-driven agentic loop: a top-down architecture phase decomposes requirements into verifiable interfaces, followed by a bottom-up implementation phase where agents generate code to satisfy those tests. ARC maintains strict traceability across requirements, design, and code to facilitate intelligent asset reuse. We evaluated ARC by generating six runnable web systems from documents spanning 50-200 multi-modal scenarios. Compared to state-of-the-art baselines, ARC-generated systems pass 50.6% more GUI tests on average. A user study with 21 participants showed that novice users can successfully write DSL documents for complex systems, such as a 10K-line ticket-booking system, in an average of 5.6 hours. These results demonstrate that ARC effectively transforms non-trivial requirement specifications into maintainable, runnable software.
