Fragmented Layer Grouping in GUI Designs Through Graph Learning Based on Multimodal Information
Yunnong Chen, Shuhong Xiao, Jiazhi Li, Tingting Zhou, Yanfang Chang, Yankun Zhen, Lingyun Sun, Liuqing Chen
TL;DR
This paper tackles fragmented layer grouping in GUI designs by introducing a graph-learning pipeline that fuses multimodal information from design prototypes. It constructs a UI graph from layer inclusions, uses a multi-head attention enhanced GNN to classify layers and regress merging-group bounding boxes, and applies a novel NMS-based box merging strategy to form semantically coherent groups. Through extensive experiments and a user study, the method achieves state-of-the-art performance on real-world datasets and demonstrates improved code readability and maintainability for downstream GUI-to-code tools. The work advances GUI understanding by leveraging multimodal cues and graph-structured reasoning to robustly group fragmented layers across diverse designs, with practical impact on automated front-end code generation.
Abstract
Automatically constructing GUI groups of different granularities constitutes a critical intelligent step towards automating GUI design and implementation tasks. Specifically, in the industrial GUI-to-code process, fragmented layers may decrease the readability and maintainability of generated code, which can be alleviated by grouping semantically consistent fragmented layers in the design prototypes. This study aims to propose a graph-learning-based approach to tackle the fragmented layer grouping problem according to multi-modal information in design prototypes. Our graph learning module consists of self-attention and graph neural network modules. By taking the multimodal fused representation of GUI layers as input, we innovatively group fragmented layers by classifying GUI layers and regressing the bounding boxes of the corresponding GUI components simultaneously. Experiments on two real-world datasets demonstrate that our model achieves state-of-the-art performance. A further user study is also conducted to validate that our approach can assist an intelligent downstream tool in generating more maintainable and readable front-end code.
