GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
Yuan Cao, Dezhi Ran, Mengzhou Wu, Yuzhe Guo, Xin Chen, Ang Li, Gang Cao, Gong Zhi, Hao Yu, Linyi Li, Wei Yang, Tao Xie
TL;DR
GUI-Genesis tackles efficiency and verifiability bottlenecks in GUI agent training by synthesizing task-specific, executable web environments with code-native rewards. The approach combines trace-driven context, hierarchical code synthesis with meta-prompting, and automated verification to produce deterministic feedback and near-instant simulation. Empirical results on WeChat Mini-Apps demonstrate strong sim-to-real transfer, with code-native rewards achieving peak Real-World SR gains and substantial reductions in latency and cost compared with real-world training. A notable finding is the synthesis-navigation gap, revealing potential for self-improving agents where the generator and agent co-evolve to tackle increasingly challenging tasks.
Abstract
Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.
