Table of Contents
Fetching ...

NaviQAte: Functionality-Guided Web Application Navigation

Mobina Shahbandeh, Parsa Alian, Noor Nashid, Ali Mesbah

TL;DR

NaviQAte tackles end-to-end web testing by reframing navigation as a question-and-answer task focused on broadened functionalities rather than specific tasks. It introduces a three-phase pipeline—Action Planning, Choice Extraction, and Decision Making—that integrates retrieval-augmented functionality concretization, multi-modal webpage context, and a ranking of actionable elements guided by LLMs. On Mind2Web-Live and Mind2Web-Live-Abstracted, NaviQAte achieves higher success rates and trajectory efficiency than the WebCanvas baseline, demonstrating improved navigation and functionality exploration. The approach supports dynamic web environments and offers potential for broader automated testing and assistive web navigation.

Abstract

End-to-end web testing is challenging due to the need to explore diverse web application functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for broad functionality exploration; they rely on specific, detailed task descriptions, limiting their adaptability in dynamic web environments. We introduce NaviQAte, which frames web application exploration as a question-and-answer task, generating action sequences for functionalities without requiring detailed parameters. Our three-phase approach utilizes advanced large language models like GPT-4o for complex decision-making and cost-effective models, such as GPT-4o mini, for simpler tasks. NaviQAte focuses on functionality-guided web application navigation, integrating multi-modal inputs such as text and images to enhance contextual understanding. Evaluations on the Mind2Web-Live and Mind2Web-Live-Abstracted datasets show that NaviQAte achieves a 44.23% success rate in user task navigation and a 38.46% success rate in functionality navigation, representing a 15% and 33% improvement over WebCanvas. These results underscore the effectiveness of our approach in advancing automated web application testing.

NaviQAte: Functionality-Guided Web Application Navigation

TL;DR

NaviQAte tackles end-to-end web testing by reframing navigation as a question-and-answer task focused on broadened functionalities rather than specific tasks. It introduces a three-phase pipeline—Action Planning, Choice Extraction, and Decision Making—that integrates retrieval-augmented functionality concretization, multi-modal webpage context, and a ranking of actionable elements guided by LLMs. On Mind2Web-Live and Mind2Web-Live-Abstracted, NaviQAte achieves higher success rates and trajectory efficiency than the WebCanvas baseline, demonstrating improved navigation and functionality exploration. The approach supports dynamic web environments and offers potential for broader automated testing and assistive web navigation.

Abstract

End-to-end web testing is challenging due to the need to explore diverse web application functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for broad functionality exploration; they rely on specific, detailed task descriptions, limiting their adaptability in dynamic web environments. We introduce NaviQAte, which frames web application exploration as a question-and-answer task, generating action sequences for functionalities without requiring detailed parameters. Our three-phase approach utilizes advanced large language models like GPT-4o for complex decision-making and cost-effective models, such as GPT-4o mini, for simpler tasks. NaviQAte focuses on functionality-guided web application navigation, integrating multi-modal inputs such as text and images to enhance contextual understanding. Evaluations on the Mind2Web-Live and Mind2Web-Live-Abstracted datasets show that NaviQAte achieves a 44.23% success rate in user task navigation and a 38.46% success rate in functionality navigation, representing a 15% and 33% improvement over WebCanvas. These results underscore the effectiveness of our approach in advancing automated web application testing.
Paper Structure (33 sections, 2 equations, 5 figures, 3 tables)

This paper contains 33 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Sample task execution steps for "Find a black blazer for men with L size and add to wishlist"
  • Figure 2: Overview of NaviQAte.
  • Figure 3: Example of screenshot annotation.
  • Figure 4: Task success rate of NaviQAte for different subdomains on the Mind2Web-Live test set.
  • Figure 5: Task success rate of NaviQAte for different websites on the Mind2Web-Live test set.

Theorems & Definitions (1)

  • Definition 1: User Task