Table of Contents
Fetching ...

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

Yuan Cao, Dezhi Ran, Mengzhou Wu, Yuzhe Guo, Xin Chen, Ang Li, Gang Cao, Gong Zhi, Hao Yu, Linyi Li, Wei Yang, Tao Xie

TL;DR

GUI-Genesis tackles efficiency and verifiability bottlenecks in GUI agent training by synthesizing task-specific, executable web environments with code-native rewards. The approach combines trace-driven context, hierarchical code synthesis with meta-prompting, and automated verification to produce deterministic feedback and near-instant simulation. Empirical results on WeChat Mini-Apps demonstrate strong sim-to-real transfer, with code-native rewards achieving peak Real-World SR gains and substantial reductions in latency and cost compared with real-world training. A notable finding is the synthesis-navigation gap, revealing potential for self-improving agents where the generator and agent co-evolve to tackle increasingly challenging tasks.

Abstract

Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

TL;DR

GUI-Genesis tackles efficiency and verifiability bottlenecks in GUI agent training by synthesizing task-specific, executable web environments with code-native rewards. The approach combines trace-driven context, hierarchical code synthesis with meta-prompting, and automated verification to produce deterministic feedback and near-instant simulation. Empirical results on WeChat Mini-Apps demonstrate strong sim-to-real transfer, with code-native rewards achieving peak Real-World SR gains and substantial reductions in latency and cost compared with real-world training. A notable finding is the synthesis-navigation gap, revealing potential for self-improving agents where the generator and agent co-evolve to tackle increasingly challenging tasks.

Abstract

Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.
Paper Structure (31 sections, 1 equation, 8 figures, 3 tables)

This paper contains 31 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of GUI-Genesis
  • Figure 2: Code-native reward correlates with VLM-as-a-judge reward but provides more accurate and fine-grained reward value.
  • Figure 3: Distribution of Trajectory Lengths. A comparison of the number of steps (unique screenshots) required to solve tasks in Real-World vs. Synthesized environments, which exhibits a similar trajectory length distribution to real-world ones, implying that the synthesized environments do not distinctively trivialize the task logic.
  • Figure 4: Scaling number of synthetic environments shows continuous improvement, verified by both VLM SR and native-code SR on synthesized evaluation dataset.
  • Figure 5: Case studies when code model generates applications that itself cannot successfully navigate.
  • ...and 3 more figures