Table of Contents
Fetching ...

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiele Cheng, Xiaotao Gu, Jie Tang

TL;DR

This work tackles UI-to-code generation by rethinking it as an interactive, iterative process rather than a one-shot task. It introduces UI2Code^N and the Interactive UI-to-Code paradigm, unifying UI-to-code, UI editing, and UI polishing to enable test-time scaling through multiple refinement rounds. The model is trained via a three-stage pipeline—continual pre-training on real-world web data, supervised fine-tuning on curated data, and reinforcement learning with a GLM-4.5V verifier—to achieve strong multimodal coding performance and robust polishing capabilities. Experiments on UI-to-code and UI polishing benchmarks show state-of-the-art results among open-source models and competitive performance compared with leading closed-source systems, with real-world webpage data enhancing robustness and realism.

Abstract

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code$^\text{N}$, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code$^\text{N}$ establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

TL;DR

This work tackles UI-to-code generation by rethinking it as an interactive, iterative process rather than a one-shot task. It introduces UI2Code^N and the Interactive UI-to-Code paradigm, unifying UI-to-code, UI editing, and UI polishing to enable test-time scaling through multiple refinement rounds. The model is trained via a three-stage pipeline—continual pre-training on real-world web data, supervised fine-tuning on curated data, and reinforcement learning with a GLM-4.5V verifier—to achieve strong multimodal coding performance and robust polishing capabilities. Experiments on UI-to-code and UI polishing benchmarks show state-of-the-art results among open-source models and competitive performance compared with leading closed-source systems, with real-world webpage data enhancing robustness and realism.

Abstract

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

Paper Structure

This paper contains 38 sections, 3 equations, 8 figures, 4 tables, 3 algorithms.

Figures (8)

  • Figure 1: Top: Comparison of UI-to-code generation outputs from leading models versus our model, using the same reference screenshot. Our model achieves the highest fidelity, further enhanced by our UI polishing capability. Additional qualitative examples with diverse content, aspect ratios, and layouts are provided in Appendix \ref{['appendix: demo_cases']}. Bottom left: Performance comparison on UI-to-code and UI polishing tasks. Bottom right: Test-time scaling curve of our model on the UI-to-code task, enabled by our interactive UI-to-code paradigm.
  • Figure 2: Our interactive UI-to-code paradigm integrates UI-to-code, UI polishing, and UI editing. Iterative polishing enables continuous refinement, achieving test-time scaling for the UI-to-code task.
  • Figure 3: UI2Code$^\text{N}$ Demo Cases: UI-to-code (1/4)
  • Figure 4: UI2Code$^\text{N}$ Demo Cases: UI-to-code (2/4)
  • Figure 5: UI2Code$^\text{N}$ Demo Cases: UI-to-code (3/4)
  • ...and 3 more figures