TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
Yuheng Lu, Qian Yu, Hongru Wang, Zeming Liu, Wei Su, Yanping Liu, Yuhang Guo, Maocheng Liang, Yunhong Wang, Haifeng Wang
TL;DR
This work introduces TransBench, the first benchmark specifically designed to evaluate and enhance the transferability of GUI grounding across cross-version, cross-platform, and cross-application dimensions. It builds a multi-platform, multi-version data pipeline with 81 apps, 1,459 screenshots, and over 65,000 bounding boxes to support robust grounding evaluation, plus 22,000+ grounding instructions with high quality verified by humans. Across diverse GUI models, Qwen2.5VL achieves the best grounding accuracy while UGround often yields the smallest localization distance, and fine-tuning on older versions markedly improves cross-version performance. The results reveal substantial transferability gaps, particularly across Web, and demonstrate the practical potential of transferable GUI agents for real-world dynamic environments, while also acknowledging computational and data-efficiency limitations.
Abstract
Graphical User Interface (GUI) agents, which autonomously operate on digital interfaces through natural language instructions, hold transformative potential for accessibility, automation, and user experience. A critical aspect of their functionality is grounding - the ability to map linguistic intents to visual and structural interface elements. However, existing GUI agents often struggle to adapt to the dynamic and interconnected nature of real-world digital environments, where tasks frequently span multiple platforms and applications while also being impacted by version updates. To address this, we introduce TransBench, the first benchmark designed to systematically evaluate and enhance the transferability of GUI agents across three key dimensions: cross-version transferability (adapting to version updates), cross-platform transferability (generalizing across platforms like iOS, Android, and Web), and cross-application transferability (handling tasks spanning functionally distinct apps). TransBench includes 15 app categories with diverse functionalities, capturing essential pages across versions and platforms to enable robust evaluation. Our experiments demonstrate significant improvements in grounding accuracy, showcasing the practical utility of GUI agents in dynamic, real-world environments. Our code and data will be publicly available at GitHub.
