Table of Contents
Fetching ...

GUISpector: An MLLM Agent Framework for Automated Verification of Natural Language Requirements in GUI Prototypes

Kristian Kolthoff, Felix Kretzer, Simone Paolo Ponzetto, Alexander Maedche, Christian Bartelt

TL;DR

GUISpector addresses the challenge of verifying natural-language requirements in interactive GUI prototypes amid the rise of LLM-driven GUI development. It introduces a three-part framework that uses a multimodal LLM agent to interpret NL requirements, autonomously verify GUI implementations, and deliver structured, actionable feedback that can drive closed-loop improvements in LLM-based code generation. The experimental evaluation on a diversified dataset shows high precision and recall in identifying requirement satisfaction and violations, with efficient parallelization and manageable costs. The work offers a practical, end-to-end solution for requirements-driven GUI testing and highlights its potential to enhance early-stage development workflows.

Abstract

GUIs are foundational to interactive systems and play a pivotal role in early requirements elicitation through prototyping. Ensuring that GUI implementations fulfill NL requirements is essential for robust software engineering, especially as LLM-driven programming agents become increasingly integrated into development workflows. Existing GUI testing approaches, whether traditional or LLM-driven, often fall short in handling the complexity of modern interfaces, and typically lack actionable feedback and effective integration with automated development agents. In this paper, we introduce GUISpector, a novel framework that leverages a multi-modal (M)LLM-based agent for the automated verification of NL requirements in GUI prototypes. First, GUISpector adapts a MLLM agent to interpret and operationalize NL requirements, enabling to autonomously plan and execute verification trajectories across GUI applications. Second, GUISpector systematically extracts detailed NL feedback from the agent's verification process, providing developers with actionable insights that can be used to iteratively refine the GUI artifact or directly inform LLM-based code generation in a closed feedback loop. Third, we present an integrated tool that unifies these capabilities, offering practitioners an accessible interface for supervising verification runs, inspecting agent rationales and managing the end-to-end requirements verification process. We evaluated GUISpector on a comprehensive set of 150 requirements based on 900 acceptance criteria annotations across diverse GUI applications, demonstrating effective detection of requirement satisfaction and violations and highlighting its potential for seamless integration of actionable feedback into automated LLM-driven development workflows. The video presentation of GUISpector is available at: https://youtu.be/JByYF6BNQeE, showcasing its main capabilities.

GUISpector: An MLLM Agent Framework for Automated Verification of Natural Language Requirements in GUI Prototypes

TL;DR

GUISpector addresses the challenge of verifying natural-language requirements in interactive GUI prototypes amid the rise of LLM-driven GUI development. It introduces a three-part framework that uses a multimodal LLM agent to interpret NL requirements, autonomously verify GUI implementations, and deliver structured, actionable feedback that can drive closed-loop improvements in LLM-based code generation. The experimental evaluation on a diversified dataset shows high precision and recall in identifying requirement satisfaction and violations, with efficient parallelization and manageable costs. The work offers a practical, end-to-end solution for requirements-driven GUI testing and highlights its potential to enhance early-stage development workflows.

Abstract

GUIs are foundational to interactive systems and play a pivotal role in early requirements elicitation through prototyping. Ensuring that GUI implementations fulfill NL requirements is essential for robust software engineering, especially as LLM-driven programming agents become increasingly integrated into development workflows. Existing GUI testing approaches, whether traditional or LLM-driven, often fall short in handling the complexity of modern interfaces, and typically lack actionable feedback and effective integration with automated development agents. In this paper, we introduce GUISpector, a novel framework that leverages a multi-modal (M)LLM-based agent for the automated verification of NL requirements in GUI prototypes. First, GUISpector adapts a MLLM agent to interpret and operationalize NL requirements, enabling to autonomously plan and execute verification trajectories across GUI applications. Second, GUISpector systematically extracts detailed NL feedback from the agent's verification process, providing developers with actionable insights that can be used to iteratively refine the GUI artifact or directly inform LLM-based code generation in a closed feedback loop. Third, we present an integrated tool that unifies these capabilities, offering practitioners an accessible interface for supervising verification runs, inspecting agent rationales and managing the end-to-end requirements verification process. We evaluated GUISpector on a comprehensive set of 150 requirements based on 900 acceptance criteria annotations across diverse GUI applications, demonstrating effective detection of requirement satisfaction and violations and highlighting its potential for seamless integration of actionable feedback into automated LLM-driven development workflows. The video presentation of GUISpector is available at: https://youtu.be/JByYF6BNQeE, showcasing its main capabilities.

Paper Structure

This paper contains 9 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of the GUISpector architecture with Human-in-the-Verification-Loop, MLLM GUI agent verification loop with reasoning, actions and GUI state trajectories and agentic implementation-verification loop for fully autonomous mode