Table of Contents
Fetching ...

Toward the Automated Localization of Buggy Mobile App UIs from Bug Descriptions

Antu Saha, Yang Song, Junayed Mahmud, Ying Zhou, Kevin Moran, Oscar Chaparro

TL;DR

This work investigates automated Buggy UI Localization, framing bug description to UI localization as retrieval for UI screens (SL) and UI components (CL). It evaluates unimodal and multimodal models (SBert, Clip, Blip) alongside Lucene on a real-world dataset of 87 bug reports with OB descriptions, finding that no single model dominates both tasks, with Blip excelling in SL and SBert in CL. A second study demonstrates that incorporating localized buggy UIs can boost buggy code localization by 9%–12% in Hits@10, using end-to-end pipelines that combine UI localization with traditional IR-based code localization. The results underscore the value of blending textual and visual UI information, highlight the importance of OB quality, and provide a publicly available benchmark to advance research in Buggy UI Localization and its applications in bug management workflows.

Abstract

Bug report management is a costly software maintenance process comprised of several challenging tasks. Given the UI-driven nature of mobile apps, bugs typically manifest through the UI, hence the identification of buggy UI screens and UI components (Buggy UI Localization) is important to localizing the buggy behavior and eventually fixing it. However, this task is challenging as developers must reason about bug descriptions (which are often low-quality), and the visual or code-based representations of UI screens. This paper is the first to investigate the feasibility of automating the task of Buggy UI Localization through a comprehensive study that evaluates the capabilities of one textual and two multi-modal deep learning (DL) techniques and one textual unsupervised technique. We evaluate such techniques at two levels of granularity, Buggy UI Screen and UI Component localization. Our results illustrate the individual strengths of models that make use of different representations, wherein models that incorporate visual information perform better for UI screen localization, and models that operate on textual screen information perform better for UI component localization -- highlighting the need for a localization approach that blends the benefits of both types of techniques. Furthermore, we study whether Buggy UI Localization can improve traditional buggy code localization, and find that incorporating localized buggy UIs leads to improvements of 9%-12% in Hits@10.

Toward the Automated Localization of Buggy Mobile App UIs from Bug Descriptions

TL;DR

This work investigates automated Buggy UI Localization, framing bug description to UI localization as retrieval for UI screens (SL) and UI components (CL). It evaluates unimodal and multimodal models (SBert, Clip, Blip) alongside Lucene on a real-world dataset of 87 bug reports with OB descriptions, finding that no single model dominates both tasks, with Blip excelling in SL and SBert in CL. A second study demonstrates that incorporating localized buggy UIs can boost buggy code localization by 9%–12% in Hits@10, using end-to-end pipelines that combine UI localization with traditional IR-based code localization. The results underscore the value of blending textual and visual UI information, highlight the importance of OB quality, and provide a publicly available benchmark to advance research in Buggy UI Localization and its applications in bug management workflows.

Abstract

Bug report management is a costly software maintenance process comprised of several challenging tasks. Given the UI-driven nature of mobile apps, bugs typically manifest through the UI, hence the identification of buggy UI screens and UI components (Buggy UI Localization) is important to localizing the buggy behavior and eventually fixing it. However, this task is challenging as developers must reason about bug descriptions (which are often low-quality), and the visual or code-based representations of UI screens. This paper is the first to investigate the feasibility of automating the task of Buggy UI Localization through a comprehensive study that evaluates the capabilities of one textual and two multi-modal deep learning (DL) techniques and one textual unsupervised technique. We evaluate such techniques at two levels of granularity, Buggy UI Screen and UI Component localization. Our results illustrate the individual strengths of models that make use of different representations, wherein models that incorporate visual information perform better for UI screen localization, and models that operate on textual screen information perform better for UI component localization -- highlighting the need for a localization approach that blends the benefits of both types of techniques. Furthermore, we study whether Buggy UI Localization can improve traditional buggy code localization, and find that incorporating localized buggy UIs leads to improvements of 9%-12% in Hits@10.
Paper Structure (28 sections, 6 figures, 4 tables)

This paper contains 28 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Bug report #191 from the WiFi Analyzer app wifianalyzerbug
  • Figure 2: Example of the UI screen/component localization process for an OB/bug description of the WifiAnalyzer app wifianalyzerbug.
  • Figure 3: SL results for different query quality levels
  • Figure 4: CL results for different query quality levels
  • Figure 5: SL results for easy- and hard-to-retrieve tasks
  • ...and 1 more figures