MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents

Yifan Xu; Xiao Liu; Xinghan Liu; Jiaqi Fu; Hanchen Zhang; Bohao Jing; Shudan Zhang; Yuting Wang; Wenyi Zhao; Yuxiao Dong

MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents

Yifan Xu, Xiao Liu, Xinghan Liu, Jiaqi Fu, Hanchen Zhang, Bohao Jing, Shudan Zhang, Yuting Wang, Wenyi Zhao, Yuxiao Dong

TL;DR

This work tackles the challenge of training mobile GUI agents through online agentic reinforcement learning, addressing sparse rewards, heavy-tailed task difficulty, and large-scale sampling bottlenecks. It introduces MobileRL, a framework that combines reasoning-free and reasoning fine-tuning with AdaGRPO, which itself integrates Shortest-Path Reward Adjustment, Difficulty-Adaptive Positive Replay, and Failure Curriculum Filtering to improve sample efficiency and stability. Empirical results on AndroidWorld and AndroidLab show state-of-the-art success rates with open backbones (e.g., GLM-4.1V-9B-Base achieving 80.2% and 53.6%), and ablations confirm the value of each AdaGRPO component and the reasoning SFT stages. The work also demonstrates scalable, reproducible training across hundreds of Android emulators, advancing practical deployment of autonomous mobile GUI agents and providing an open-source framework for future research.

Abstract

Building general-purpose graphical user interface (GUI) agents has become increasingly promising with the progress in vision language models. However, developing effective mobile GUI agents with reinforcement learning (RL) remains challenging due to the heavy-tailed distribution of task difficulty and the inefficiency of large-scale environment sampling. We present an online agentic reinforcement learning framework MobileRL to enhance GUI agents in mobile environments. Its core component is the Difficulty-ADAptive GRPO (ADAGRPO) algorithm. In ADAGRPO, we design difficulty-adaptive positive replay and failure curriculum filtering to adapt the model to different task difficulties. We introduce the shortest-path reward adjustment strategy to reshape rewards concerning the task length in multi-turn agentic tasks. Those strategies jointly stabilize RL training, improve sample efficiency, and generate strong performance across diverse mobile apps and tasks. We apply MOBILERL to two open models (Qwen2.5-VL-7B-Instruct and GLM-4.1V-9B-Base). The resultant MOBILERL-9B model achieves state-of-the-art results in terms of success rates on both AndroidWorld (80.2%) and AndroidLab (53.6%). The MOBILERL framework is open-sourced at: https://github.com/THUDM/MobileRL.

MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents

TL;DR

Abstract

MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)