GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

Yuan Cao; Dezhi Ran; Mengzhou Wu; Yuzhe Guo; Xin Chen; Ang Li; Gang Cao; Gong Zhi; Hao Yu; Linyi Li; Wei Yang; Tao Xie

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

Yuan Cao, Dezhi Ran, Mengzhou Wu, Yuzhe Guo, Xin Chen, Ang Li, Gang Cao, Gong Zhi, Hao Yu, Linyi Li, Wei Yang, Tao Xie

TL;DR

GUI-Genesis tackles efficiency and verifiability bottlenecks in GUI agent training by synthesizing task-specific, executable web environments with code-native rewards. The approach combines trace-driven context, hierarchical code synthesis with meta-prompting, and automated verification to produce deterministic feedback and near-instant simulation. Empirical results on WeChat Mini-Apps demonstrate strong sim-to-real transfer, with code-native rewards achieving peak Real-World SR gains and substantial reductions in latency and cost compared with real-world training. A notable finding is the synthesis-navigation gap, revealing potential for self-improving agents where the generator and agent co-evolve to tackle increasingly challenging tasks.

Abstract

Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

TL;DR

Abstract

Paper Structure (31 sections, 1 equation, 8 figures, 3 tables)

This paper contains 31 sections, 1 equation, 8 figures, 3 tables.

Introduction
Background and Related Work
Problem Formulation
Methodology
Overview
Trace-Driven Context Acquisition
Hierarchical Code Synthesis
Meta-prompting for system design.
Plan-and-execute implementation.
Code-Native Reward Injection
Automated Self-Verification
Static Self-Reflection.
Dynamic Playwright Testing.
Experiment Setup
Datasets and Tasks.
...and 16 more sections

Figures (8)

Figure 1: Overview of GUI-Genesis
Figure 2: Code-native reward correlates with VLM-as-a-judge reward but provides more accurate and fine-grained reward value.
Figure 3: Distribution of Trajectory Lengths. A comparison of the number of steps (unique screenshots) required to solve tasks in Real-World vs. Synthesized environments, which exhibits a similar trajectory length distribution to real-world ones, implying that the synthesized environments do not distinctively trivialize the task logic.
Figure 4: Scaling number of synthetic environments shows continuous improvement, verified by both VLM SR and native-code SR on synthesized evaluation dataset.
Figure 5: Case studies when code model generates applications that itself cannot successfully navigate.
...and 3 more figures

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

TL;DR

Abstract

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

Authors

TL;DR

Abstract

Table of Contents

Figures (8)