Table of Contents
Fetching ...

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

Mingde Xu, Zhen Yang, Wenyi Hong, Lihang Pan, Xinyue Fan, Yan Wang, Xiaotao Gu, Bin Xu, Jie Tang

TL;DR

WebVIA addresses the gap in UI-to-Code generation by introducing an agentic, multi-component pipeline that performs interactive UI exploration, generates executable and interactive front-end code, and validates interactivity. The framework consists of WebVIA-Agent for stateful UI discovery, WebVIA-UI2Code for producing behavior-preserving code, and a validation module to verify end-to-end interactivity, all trained on large synthetic GUI datasets. Experimental results show superior stability, accuracy, and interactivity over strong baselines, with significant gains on dedicated benchmarks and valid code outputs, illustrating the feasibility of scalable, verifiable UI synthesis. This approach has practical implications for reducing manual UI development and enabling robust, testable front-end workflows.

Abstract

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at \href{https://zheny2751-dotcom.github.io/webvia.github.io/}{\texttt{https://webvia.github.io}}.

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

TL;DR

WebVIA addresses the gap in UI-to-Code generation by introducing an agentic, multi-component pipeline that performs interactive UI exploration, generates executable and interactive front-end code, and validates interactivity. The framework consists of WebVIA-Agent for stateful UI discovery, WebVIA-UI2Code for producing behavior-preserving code, and a validation module to verify end-to-end interactivity, all trained on large synthetic GUI datasets. Experimental results show superior stability, accuracy, and interactivity over strong baselines, with significant gains on dedicated benchmarks and valid code outputs, illustrating the feasibility of scalable, verifiable UI synthesis. This approach has practical implications for reducing manual UI development and enabling robust, testable front-end workflows.

Abstract

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at \href{https://zheny2751-dotcom.github.io/webvia.github.io/}{\texttt{https://webvia.github.io}}.

Paper Structure

This paper contains 30 sections, 2 equations, 30 figures, 5 tables.

Figures (30)

  • Figure 1: Motivating example illustrating the gap between static and interactive code generation.
  • Figure 2: Overview of the WebVIA framework, which comprises three components: (a) an exploration agent to capture multi-state UI screenshots; (b) a UI2Code model to generate interactive code; (c) a validation module to verify the interactivity.
  • Figure 3: Correlation between the mean interaction trace length and the overall exploration score across our WebVIA-Agent and various VLMs.
  • Figure 4: Overview of the webpage synthesis process in the WebVIA framework.
  • Figure 5: Template used to construct webpage design prompts for generating interactive webpage HTML code.
  • ...and 25 more figures