Table of Contents
Fetching ...

UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface

Shuhong Xiao, Yunnong Chen, Yaxuan Song, Liuqing Chen, Lingyun Sun, Yankun Zhen, Yanfang Chang

TL;DR

This work addresses the fragmentation of UI elements by introducing semantic component groups that bundle adjacent text and non-text elements with shared semantics. A data-driven detector, UISCGD, built on an enhanced Deformable DETR, uses a colormap prior and learned group distribution to accurately delineate these groups from mobile UI screenshots. The approach advances downstream tasks by enabling reliable perceptual grouping, improving UI-to-code generation, and generating accessibility data for screen readers. The authors validate their method on a large mobile GUI dataset and demonstrate notable gains over baselines, showing practical impact for UI design, development, and accessibility workflows.

Abstract

Texts, widgets, and images on a UI page do not work separately. Instead, they are partitioned into groups to achieve certain interaction functions or visual information. Existing studies on UI elements grouping mainly focus on a specific single UI-related software engineering task, and their groups vary in appearance and function. In this case, we propose our semantic component groups that pack adjacent text and non-text elements with similar semantics. In contrast to those task-oriented grouping methods, our semantic component group can be adopted for multiple UI-related software tasks, such as retrieving UI perceptual groups, improving code structure for automatic UI-to-code generation, and generating accessibility data for screen readers. To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD, which extends the SOTA deformable-DETR by incorporating UI element color representation and a learned prior on group distribution. The model is trained on our UI screenshots dataset of 1988 mobile GUIs from more than 200 apps in both iOS and Android platforms. The evaluation shows that our UISCGD achieves 6.1\% better than the best baseline algorithm and 5.4 \% better than deformable-DETR in which it is based.

UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface

TL;DR

This work addresses the fragmentation of UI elements by introducing semantic component groups that bundle adjacent text and non-text elements with shared semantics. A data-driven detector, UISCGD, built on an enhanced Deformable DETR, uses a colormap prior and learned group distribution to accurately delineate these groups from mobile UI screenshots. The approach advances downstream tasks by enabling reliable perceptual grouping, improving UI-to-code generation, and generating accessibility data for screen readers. The authors validate their method on a large mobile GUI dataset and demonstrate notable gains over baselines, showing practical impact for UI design, development, and accessibility workflows.

Abstract

Texts, widgets, and images on a UI page do not work separately. Instead, they are partitioned into groups to achieve certain interaction functions or visual information. Existing studies on UI elements grouping mainly focus on a specific single UI-related software engineering task, and their groups vary in appearance and function. In this case, we propose our semantic component groups that pack adjacent text and non-text elements with similar semantics. In contrast to those task-oriented grouping methods, our semantic component group can be adopted for multiple UI-related software tasks, such as retrieving UI perceptual groups, improving code structure for automatic UI-to-code generation, and generating accessibility data for screen readers. To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD, which extends the SOTA deformable-DETR by incorporating UI element color representation and a learned prior on group distribution. The model is trained on our UI screenshots dataset of 1988 mobile GUIs from more than 200 apps in both iOS and Android platforms. The evaluation shows that our UISCGD achieves 6.1\% better than the best baseline algorithm and 5.4 \% better than deformable-DETR in which it is based.
Paper Structure (26 sections, 6 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 26 sections, 6 equations, 10 figures, 4 tables, 2 algorithms.

Figures (10)

  • Figure 1: Examples of UI elements groups: (a) fragmented UI layers group; (b) navigation groups for screen reader accessibility; (c) psychologically-inspired perceptual groups; (d) our semantic component groups. Groups are labeled by red bounding boxes.
  • Figure 2: UI grouping for software tasks
  • Figure 3: (a) A review of multi-scale deformable attention, attention mechanism apply between each reference point (in orange) and several points sampled; (b) the predicted bounding boxes represented by center point and box size is refined iteratively.
  • Figure 4: Two fusion strategies applied for colormap.
  • Figure 5: The Gaussian function $\alpha(i,j)$ applied a soft weighting for each local correlation on the box refinement.
  • ...and 5 more figures