UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Zhen Yang; Wenyi Hong; Mingde Xu; Xinyue Fan; Weihan Wang; Jiele Cheng; Xiaotao Gu; Jie Tang

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiele Cheng, Xiaotao Gu, Jie Tang

TL;DR

This work tackles UI-to-code generation by rethinking it as an interactive, iterative process rather than a one-shot task. It introduces UI2Code^N and the Interactive UI-to-Code paradigm, unifying UI-to-code, UI editing, and UI polishing to enable test-time scaling through multiple refinement rounds. The model is trained via a three-stage pipeline—continual pre-training on real-world web data, supervised fine-tuning on curated data, and reinforcement learning with a GLM-4.5V verifier—to achieve strong multimodal coding performance and robust polishing capabilities. Experiments on UI-to-code and UI polishing benchmarks show state-of-the-art results among open-source models and competitive performance compared with leading closed-source systems, with real-world webpage data enhancing robustness and realism.

Abstract

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code$^\text{N}$, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code$^\text{N}$ establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

TL;DR

Abstract

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)