Table of Contents
Fetching ...

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents

Han Xiao, Guozhi Wang, Hao Wang, Shilong Liu, Yuxiang Chai, Yue Pan, Yufeng Zhou, Xiaoxin Chen, Yafei Wen, Hongsheng Li

TL;DR

UI-Mem tackles the core bottlenecks of online reinforcement learning for mobile GUI agents: credit assignment in long-horizon tasks and lack of cross-task experience transfer. It introduces a Hierarchical Experience Memory that stores parameterized templates for workflows, subtask skills, and failure patterns, enabling cross-application reuse. The framework combines Memory-Guided Exploration with Stratified Group Sampling and a Self-Evolving Loop, yielding dense subtask guidance during training while gradually internalizing memory through adaptive retrieval and memory updates. Empirical results on challenging GUI benchmarks show significant gains over baselines and strong cross-task generalization, highlighting the practical impact of structured, evolving experience for GUI agents.

Abstract

Online Reinforcement Learning (RL) offers a promising paradigm for enhancing GUI agents through direct environment interaction. However, its effectiveness is severely hindered by inefficient credit assignment in long-horizon tasks and repetitive errors across tasks due to the lack of experience transfer. To address these challenges, we propose UI-Mem, a novel framework that enhances GUI online RL with a Hierarchical Experience Memory. Unlike traditional replay buffers, our memory accumulates structured knowledge, including high-level workflows, subtask skills, and failure patterns. These experiences are stored as parameterized templates that enable cross-task and cross-application transfer. To effectively integrate memory guidance into online RL, we introduce Stratified Group Sampling, which injects varying levels of guidance across trajectories within each rollout group to maintain outcome diversity, driving the unguided policy toward internalizing guided behaviors. Furthermore, a Self-Evolving Loop continuously abstracts novel strategies and errors to keep the memory aligned with the agent's evolving policy. Experiments on online GUI benchmarks demonstrate that UI-Mem significantly outperforms traditional RL baselines and static reuse strategies, with strong generalization to unseen applications. Project page: https://ui-mem.github.io

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents

TL;DR

UI-Mem tackles the core bottlenecks of online reinforcement learning for mobile GUI agents: credit assignment in long-horizon tasks and lack of cross-task experience transfer. It introduces a Hierarchical Experience Memory that stores parameterized templates for workflows, subtask skills, and failure patterns, enabling cross-application reuse. The framework combines Memory-Guided Exploration with Stratified Group Sampling and a Self-Evolving Loop, yielding dense subtask guidance during training while gradually internalizing memory through adaptive retrieval and memory updates. Empirical results on challenging GUI benchmarks show significant gains over baselines and strong cross-task generalization, highlighting the practical impact of structured, evolving experience for GUI agents.

Abstract

Online Reinforcement Learning (RL) offers a promising paradigm for enhancing GUI agents through direct environment interaction. However, its effectiveness is severely hindered by inefficient credit assignment in long-horizon tasks and repetitive errors across tasks due to the lack of experience transfer. To address these challenges, we propose UI-Mem, a novel framework that enhances GUI online RL with a Hierarchical Experience Memory. Unlike traditional replay buffers, our memory accumulates structured knowledge, including high-level workflows, subtask skills, and failure patterns. These experiences are stored as parameterized templates that enable cross-task and cross-application transfer. To effectively integrate memory guidance into online RL, we introduce Stratified Group Sampling, which injects varying levels of guidance across trajectories within each rollout group to maintain outcome diversity, driving the unguided policy toward internalizing guided behaviors. Furthermore, a Self-Evolving Loop continuously abstracts novel strategies and errors to keep the memory aligned with the agent's evolving policy. Experiments on online GUI benchmarks demonstrate that UI-Mem significantly outperforms traditional RL baselines and static reuse strategies, with strong generalization to unseen applications. Project page: https://ui-mem.github.io
Paper Structure (49 sections, 7 equations, 16 figures, 4 tables)

This paper contains 49 sections, 7 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Comparison of RL paradigms for GUI agents. (a) Standard Online RL suffers from sparse rewards. (b) Experience Replay and (c) Dense Reward address sample efficiency and credit assignment respectively, but both lack mechanisms for Cross-Task Transfer. (d) Our Framework introduces an Evolving Memory that provides hierarchical guidance for exploration and continuously updates itself by abstracting successful plans and failure patterns from new trajectories, enabling cross-task knowledge transfer.
  • Figure 2: Overview of the proposed UI-Mem framework. Given a task instruction, the agent retrieves hierarchical experience including Workflows, Subtask Skills, and Failure Patterns. We employ Stratified Group Sampling to generate a group of trajectories under varying levels of guidance (Strong, Weak, and No Guidance), enabling effective advantage estimation for Policy Optimization. Finally, a Self-Evolving Loop extracts abstract plans from successful trajectories and diagnoses from failures to update the memory, facilitating continuous refinement and cross-task transfer.
  • Figure 3: Illustration of the Hierarchical Experience Retrieval process. Given a task instruction, the system performs template matching to extract specific variables (e.g., city names). The retrieved experience is instantiated with the extracted variables to form a concrete, actionable plan for the current rollout.
  • Figure 4: The Self-Evolving Loop. Successful plans and failure causes are extracted from new trajectories to continually refine the memory and guide next rollouts.
  • Figure 5: Component analysis of UI-Mem. We evaluate the impact of removing different components in our framework.
  • ...and 11 more figures