Table of Contents
Fetching ...

WebRenderBench: Enhancing Web Interface Generation through Layout-Style Consistency and Reinforcement Learning

Peichao Lai, Jinhui Zhuang, Kexuan Zhang, Ningchang Xiong, Shengjie Wang, Yanwei Xu, Chong Chen, Yilei Wang, Bin Cui

TL;DR

WebRenderBench addresses the core challenges of WebUI-to-Code by providing a large-scale, real-world dataset and a robust, render-based metric to assess layout and style fidelity. It introduces ALISA, an automated agent that injects the proposed RDA, GDA, and SDA metrics as reinforcement learning rewards to improve code generation on asymmetric webpages. Empirical results show state-of-the-art performance across multiple metrics and demonstrate the importance of layout alignment, with ablations confirming the benefits of combining layout and style signals. This work establishes a practical benchmark and RL framework that enables more reliable, objective assessment and enhancement of WebUI-to-Code generation for real-world web designs.

Abstract

Automating the conversion of UI images into web code is a critical task for front-end development and rapid prototyping. Advances in multimodal large language models (MLLMs) have made WebUI-to-Code increasingly feasible, yet existing benchmarks remain limited in data diversity and evaluation reliability. To address these issues, we present WebRenderBench, a large-scale benchmark of 45.1k webpages collected from real-world portal sites, offering greater diversity, complexity, and realism than prior benchmarks. We further propose a novel evaluation metric that measures layout and style consistency from the final rendered pages. Unlike vision-based methods that rely on costly LLM reasoning or structure-based comparisons vulnerable to noise and asymmetry, our approach enables more efficient, objective, and reliable UI quality assessment. Finally, we introduce the Automated Layout and Style Inspection Agent (ALISA), which integrates this metric into reinforcement learning as a reward signal to enhance training on crawled asymmetric webpages. Experiments show that ALISA significantly boosts generation performance, achieving state-of-the-art results across multiple metrics.

WebRenderBench: Enhancing Web Interface Generation through Layout-Style Consistency and Reinforcement Learning

TL;DR

WebRenderBench addresses the core challenges of WebUI-to-Code by providing a large-scale, real-world dataset and a robust, render-based metric to assess layout and style fidelity. It introduces ALISA, an automated agent that injects the proposed RDA, GDA, and SDA metrics as reinforcement learning rewards to improve code generation on asymmetric webpages. Empirical results show state-of-the-art performance across multiple metrics and demonstrate the importance of layout alignment, with ablations confirming the benefits of combining layout and style signals. This work establishes a practical benchmark and RL framework that enables more reliable, objective assessment and enhancement of WebUI-to-Code generation for real-world web designs.

Abstract

Automating the conversion of UI images into web code is a critical task for front-end development and rapid prototyping. Advances in multimodal large language models (MLLMs) have made WebUI-to-Code increasingly feasible, yet existing benchmarks remain limited in data diversity and evaluation reliability. To address these issues, we present WebRenderBench, a large-scale benchmark of 45.1k webpages collected from real-world portal sites, offering greater diversity, complexity, and realism than prior benchmarks. We further propose a novel evaluation metric that measures layout and style consistency from the final rendered pages. Unlike vision-based methods that rely on costly LLM reasoning or structure-based comparisons vulnerable to noise and asymmetry, our approach enables more efficient, objective, and reliable UI quality assessment. Finally, we introduce the Automated Layout and Style Inspection Agent (ALISA), which integrates this metric into reinforcement learning as a reward signal to enhance training on crawled asymmetric webpages. Experiments show that ALISA significantly boosts generation performance, achieving state-of-the-art results across multiple metrics.

Paper Structure

This paper contains 19 sections, 4 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Dataset construction pipeline and the architecture of the ALISA framework.
  • Figure 2: Sunburst chart showing the distribution of the top 10 industries by sample count across different Group Counts in our WebRenderBench test set.
  • Figure 3: Ratio of Matched Associated Elements across different Group Count ranges.
  • Figure 4: Example prompts for VLM inference and training, where the purple sections indicate editable variables. Additionally, an <image> placeholder is prepended at the beginning of the input.
  • Figure 5: Impact of RDA, GDA, and SDA reward weights on model performance using Qwen2.5-VL-3B-Instruct as the backbone model.
  • ...and 2 more figures