UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface
Shuhong Xiao, Yunnong Chen, Yaxuan Song, Liuqing Chen, Lingyun Sun, Yankun Zhen, Yanfang Chang
TL;DR
This work addresses the fragmentation of UI elements by introducing semantic component groups that bundle adjacent text and non-text elements with shared semantics. A data-driven detector, UISCGD, built on an enhanced Deformable DETR, uses a colormap prior and learned group distribution to accurately delineate these groups from mobile UI screenshots. The approach advances downstream tasks by enabling reliable perceptual grouping, improving UI-to-code generation, and generating accessibility data for screen readers. The authors validate their method on a large mobile GUI dataset and demonstrate notable gains over baselines, showing practical impact for UI design, development, and accessibility workflows.
Abstract
Texts, widgets, and images on a UI page do not work separately. Instead, they are partitioned into groups to achieve certain interaction functions or visual information. Existing studies on UI elements grouping mainly focus on a specific single UI-related software engineering task, and their groups vary in appearance and function. In this case, we propose our semantic component groups that pack adjacent text and non-text elements with similar semantics. In contrast to those task-oriented grouping methods, our semantic component group can be adopted for multiple UI-related software tasks, such as retrieving UI perceptual groups, improving code structure for automatic UI-to-code generation, and generating accessibility data for screen readers. To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD, which extends the SOTA deformable-DETR by incorporating UI element color representation and a learned prior on group distribution. The model is trained on our UI screenshots dataset of 1988 mobile GUIs from more than 200 apps in both iOS and Android platforms. The evaluation shows that our UISCGD achieves 6.1\% better than the best baseline algorithm and 5.4 \% better than deformable-DETR in which it is based.
