Table of Contents
Fetching ...

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai, Zimu Lu, Weifeng Lin, Hao He, Lue Fan, Liuyang Bian, Rui Hu, Liang Liu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Aojun Zhou, Hongsheng Li

TL;DR

UI-Genie tackles two core GUI agent data challenges—trajectory outcome verification and scalable high-quality data—by introducing a unified reward model (UI-Genie-RM) with an image-text interleaved architecture and a self-improving data-model loop. It constructs a large reward-focused dataset (UI-Genie-RM-517k) and a synthetic trajectory corpus (UI-Genie-Agent-16k) without manual annotation, then iteratively refines both agent and reward models through reward-guided exploration and outcome verification. Across static and dynamic GUI benchmarks (AndroidControl, AndroidLab, A3), UI-Genie achieves state-of-the-art results, and ablations demonstrate the critical role of history, unified rewards, and the self-improvement loop. The work reduces dependence on human annotations and offers a scalable pathway for robust mobile GUI automation, with open-source releases to spur further research.

Abstract

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

TL;DR

UI-Genie tackles two core GUI agent data challenges—trajectory outcome verification and scalable high-quality data—by introducing a unified reward model (UI-Genie-RM) with an image-text interleaved architecture and a self-improving data-model loop. It constructs a large reward-focused dataset (UI-Genie-RM-517k) and a synthetic trajectory corpus (UI-Genie-Agent-16k) without manual annotation, then iteratively refines both agent and reward models through reward-guided exploration and outcome verification. Across static and dynamic GUI benchmarks (AndroidControl, AndroidLab, A3), UI-Genie achieves state-of-the-art results, and ablations demonstrate the critical role of history, unified rewards, and the self-improvement loop. The work reduces dependence on human annotations and offers a scalable pathway for robust mobile GUI automation, with open-source releases to spur further research.

Abstract

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

Paper Structure

This paper contains 28 sections, 1 equation, 15 figures, 8 tables.

Figures (15)

  • Figure 1: UI-Genie dataset statistics. UI-Genie-RM-517k is the first dedicated GUI agent reward dataset, while UI-Genie-Agent-16k contains synthetic trajectories without manual annotation.
  • Figure 2: Overview of UI-Genie-RM model and reward training data construction. The model processes task instruction, historical context, current screenshot, and candidate action as inputs. Outputs are supervised by both action-level and task-level rewards. The training data are constructed by rule-based verification, trajectory corruption, and hard negative mining processes.
  • Figure 3: Self-improvement of agent and reward models for UI-Genie. It expands training sets for both agent and reward models through reward-guided trajectory exploration and outcome verification, then finetunes both models. This process repeats iteratively to improve capabilities on increasingly complex tasks.
  • Figure 4: Performance evolution across iterative self-improvement rounds.
  • Figure 5: Step-level reward evaluation prompt used for comparative baseline models.
  • ...and 10 more figures