LLM-Guided Scenario-based GUI Testing
Shengcheng Yu, Yuchen Ling, Chunrong Fang, Quan Zhou, Yi Zhao, Chunyang Chen, Shaomin Zhu, Zhenyu Chen
TL;DR
This paper tackles the gap between low-level GUI exploration and high-level business logic in mobile app testing by introducing ScenGen, an LLM-guided, scenario-based GUI testing framework. ScenGen employs a multi-agent architecture (Observer, Decider, Executor, Supervisor, Recorder) supported by a structured context memory to perceive, reason, act, verify, and learn from GUI interactions. It combines vision-based widget detection with multi-modal LLM reasoning to produce semantically coherent test sequences aligned with defined testing scenarios, and it demonstrates superior scenario coverage, robustness, and bug-detection capabilities compared with several baselines. The empirical study, using a diverse app bench and ten realistic scenarios, shows high final localization accuracy, near-perfect scenario completion, and competitive efficiency, underscoring the practical potential of integrating visual semantics with scenario-aware planning for automated GUI testing. The work contributes a replication package, a detailed evaluation framework, and a scalable blueprint for extending automated GUI testing toward human-like, scenario-driven testing intelligence.
Abstract
The assurance of mobile app GUIs has become increasingly important, as the GUI serves as the primary medium of interaction between users and apps. Although numerous automated GUI testing approaches have been developed with diverse strategies, a substantial gap remains between these approaches and the underlying app business logic. Most existing approaches focus on general exploration rather than the completion of specific testing scenarios, often missing critical functionalities. Inspired by manual testing, which treats business logic-driven scenarios as the fundamental unit of testing, this paper introduces an approach that leverages large language models to comprehend GUI semantics and contextual relevance to given scenarios. Building on this capability, we propose ScenGen, an LLM-guided scenario-based GUI testing framework employing multi-agent collaboration to simulate and automate manual testing phases. Specifically, ScenGen integrates five agents: the Observer, Decider, Executor, Supervisor, and Recorder. The Observer perceives the app GUI state by extracting and structuring GUI widgets and layouts, interpreting semantic information. This is passed to the Decider, which makes scenario-driven decisions with LLM guidance to identify target widgets and determine actions toward fulfilling specific goals. The Executor performs these operations, while the Supervisor verifies alignment with intended scenario completion, ensuring traceability and consistency. Finally, the Recorder logs GUI operations into context memory as a knowledge base for subsequent decision-making and monitors runtime bugs. Comprehensive evaluations demonstrate that ScenGen effectively generates scenario-based GUI tests guided by LLM collaboration, achieving higher relevance to business logic and improving the completeness of automated GUI testing.
