Table of Contents
Fetching ...

Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development

Yuxuan Wan, Tingshuo Liang, Jiakai Xu, Jingyu Xiao, Yintong Huo, Michael R. Lyu

TL;DR

TDDev presents the first TDD-enabled multi-agent framework for automatic end-to-end full-stack web development from natural language or design images. It assembles three specialized agents—test-case generation, development, and testing—into a cohesive Req-to-App workflow that iteratively refines implementations based on executable tests and user-simulation feedback. Experimental results on the multimodal Req-to-App-MM benchmark show TDDev achieves higher accuracy, lower failure rates, and competitive visual fidelity compared with baselines, while substantially reducing manual developer effort. This work demonstrates the practical potential of integrating TDD with LLM agents to automate reliable, visually faithful, full-stack web applications at scale, with clear directions for further improvements in open-source model support and scalability of feedback cycles.

Abstract

Developing full-stack web applications is complex and time-intensive, demanding proficiency across diverse technologies and frameworks. Although recent advances in multimodal large language models (MLLMs) enable automated webpage generation from visual inputs, current solutions remain limited to front-end tasks and fail to deliver fully functional applications. In this work, we introduce TDDev, the first test-driven development (TDD)-enabled LLM-agent framework for end-to-end full-stack web application generation. Given a natural language description or design image, TDDev automatically derives executable test cases, generates front-end and back-end code, simulates user interactions, and iteratively refines the implementation until all requirements are satisfied. Our framework addresses key challenges in full-stack automation, including underspecified user requirements, complex interdependencies among multiple files, and the need for both functional correctness and visual fidelity. Through extensive experiments on diverse application scenarios, TDDev achieves a 14.4% improvement on overall accuracy compared to state-of-the-art baselines, demonstrating its effectiveness in producing reliable, high-quality web applications without requiring manual intervention.

Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development

TL;DR

TDDev presents the first TDD-enabled multi-agent framework for automatic end-to-end full-stack web development from natural language or design images. It assembles three specialized agents—test-case generation, development, and testing—into a cohesive Req-to-App workflow that iteratively refines implementations based on executable tests and user-simulation feedback. Experimental results on the multimodal Req-to-App-MM benchmark show TDDev achieves higher accuracy, lower failure rates, and competitive visual fidelity compared with baselines, while substantially reducing manual developer effort. This work demonstrates the practical potential of integrating TDD with LLM agents to automate reliable, visually faithful, full-stack web applications at scale, with clear directions for further improvements in open-source model support and scalability of feedback cycles.

Abstract

Developing full-stack web applications is complex and time-intensive, demanding proficiency across diverse technologies and frameworks. Although recent advances in multimodal large language models (MLLMs) enable automated webpage generation from visual inputs, current solutions remain limited to front-end tasks and fail to deliver fully functional applications. In this work, we introduce TDDev, the first test-driven development (TDD)-enabled LLM-agent framework for end-to-end full-stack web application generation. Given a natural language description or design image, TDDev automatically derives executable test cases, generates front-end and back-end code, simulates user interactions, and iteratively refines the implementation until all requirements are satisfied. Our framework addresses key challenges in full-stack automation, including underspecified user requirements, complex interdependencies among multiple files, and the need for both functional correctness and visual fidelity. Through extensive experiments on diverse application scenarios, TDDev achieves a 14.4% improvement on overall accuracy compared to state-of-the-art baselines, demonstrating its effectiveness in producing reliable, high-quality web applications without requiring manual intervention.

Paper Structure

This paper contains 46 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Comparison between our proposed TDDev framework (lower) and current industry tools (upper).
  • Figure 2: The workflow of the test generation agent.
  • Figure 3: Workflow of the development agent.
  • Figure 4: An example data instance. Only 1 out of 7 test cases are shown.