GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Tao Liu; Chongyu Wang; Rongjie Li; Yingchen Yu; Xuming He; Bai Song

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Tao Liu, Chongyu Wang, Rongjie Li, Yingchen Yu, Xuming He, Bai Song

TL;DR

GUI-Rise addresses the challenge of robust, long-horizon GUI navigation by integrating structured reasoning, action prediction, and compact history summarization within a multimodal LLM framework. It uses a two-stage training regime—supervised fine-tuning on pseudo-labeled traces and reinforcement learning via Group Relative Policy Optimization (GRPO) with format, action, and history rewards—to align reasoning quality with execution accuracy. Across Mind2Web, AITW, GUIAct, and MiniWob, GUI-Rise achieves state-of-the-art performance, particularly in out-of-domain and online settings, demonstrating strong generalization and stability for complex multi-step GUI tasks. The approach improves memory efficiency and interpretability through explicit CoT reasoning and dense yet compact history representations, enabling practical deployment in real-world, dynamic interfaces.

Abstract

While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We present a reasoning-enhanced framework that systematically integrates structured reasoning, action prediction, and history summarization. The structured reasoning component generates coherent Chain-of-Thought analyses combining progress estimation and decision reasoning, which inform both immediate action predictions and compact history summaries for future steps. Based on this framework, we train a GUI agent, \textbf{GUI-Rise}, through supervised fine-tuning on pseudo-labeled trajectories and reinforcement learning with Group Relative Policy Optimization (GRPO). This framework employs specialized rewards, including a history-aware objective, directly linking summary quality to subsequent action performance. Comprehensive evaluations on standard benchmarks demonstrate state-of-the-art results under identical training data conditions, with particularly strong performance in out-of-domain scenarios. These findings validate our framework's ability to maintain robust reasoning and generalization across diverse GUI navigation tasks. Code is available at https://leon022.github.io/GUI-Rise.

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

TL;DR

Abstract

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)