Vision-Based Mobile App GUI Testing: A Survey
Shengcheng Yu, Chunrong Fang, Ziyuan Tuo, Quanjun Zhang, Chunyang Chen, Zhenyu Chen, Zhendong Su
TL;DR
The paper surveys vision-based mobile app GUI testing, arguing that visual analysis of GUI screenshots addresses core limitations of code- or layout-based approaches. It develops a two-axis taxonomy (source code requirement and approach basis) and organizes the literature into eight topics spanning automated testing, automation-assisted manual testing, and fundamental techniques, with a rigorous methodology yielding a near-perfect inter-rater agreement. It highlights how vision-based methods improve GUI element detection, test generation, and cross-environment applicability, while also detailing challenges in semantics, oracle generation, and scenario understanding. The survey identifies opportunities at the intersection of multimodal perception, knowledge graphs, and large language models to push GUI testing toward more intelligent, scalable, and reusable solutions across devices and platforms.
Abstract
Graphical User Interface (GUI) has become one of the most significant parts of mobile applications (apps). It is a direct bridge between mobile apps and end users, which directly affects the end user's experience. Neglecting GUI quality can undermine the value and effectiveness of the entire mobile app solution. Significant research efforts have been devoted to GUI testing, one effective method to ensure mobile app quality. By conducting rigorous GUI testing, developers can ensure that the visual and interactive elements of the mobile apps not only meet functional requirements but also provide a seamless and user-friendly experience. However, traditional solutions, relying on the source code or layout files, have met challenges in both effectiveness and efficiency due to the gap between what is obtained and what app GUI actually presents. Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies and have achieved promising progress. In this survey paper, we provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies. This survey covers different topics of GUI testing, like GUI test generation, GUI test record & replay, GUI testing framework, etc. Specifically, the research emphasis of this survey is placed mostly on how vision-based techniques outperform traditional solutions and have gradually taken a vital place in the GUI testing field. Based on the investigation of existing studies, we outline the challenges and opportunities of (vision-based) mobile app GUI testing and propose promising research directions with the combination of emerging techniques.
