Table of Contents
Fetching ...

Fragmented Layer Grouping in GUI Designs Through Graph Learning Based on Multimodal Information

Yunnong Chen, Shuhong Xiao, Jiazhi Li, Tingting Zhou, Yanfang Chang, Yankun Zhen, Lingyun Sun, Liuqing Chen

TL;DR

This paper tackles fragmented layer grouping in GUI designs by introducing a graph-learning pipeline that fuses multimodal information from design prototypes. It constructs a UI graph from layer inclusions, uses a multi-head attention enhanced GNN to classify layers and regress merging-group bounding boxes, and applies a novel NMS-based box merging strategy to form semantically coherent groups. Through extensive experiments and a user study, the method achieves state-of-the-art performance on real-world datasets and demonstrates improved code readability and maintainability for downstream GUI-to-code tools. The work advances GUI understanding by leveraging multimodal cues and graph-structured reasoning to robustly group fragmented layers across diverse designs, with practical impact on automated front-end code generation.

Abstract

Automatically constructing GUI groups of different granularities constitutes a critical intelligent step towards automating GUI design and implementation tasks. Specifically, in the industrial GUI-to-code process, fragmented layers may decrease the readability and maintainability of generated code, which can be alleviated by grouping semantically consistent fragmented layers in the design prototypes. This study aims to propose a graph-learning-based approach to tackle the fragmented layer grouping problem according to multi-modal information in design prototypes. Our graph learning module consists of self-attention and graph neural network modules. By taking the multimodal fused representation of GUI layers as input, we innovatively group fragmented layers by classifying GUI layers and regressing the bounding boxes of the corresponding GUI components simultaneously. Experiments on two real-world datasets demonstrate that our model achieves state-of-the-art performance. A further user study is also conducted to validate that our approach can assist an intelligent downstream tool in generating more maintainable and readable front-end code.

Fragmented Layer Grouping in GUI Designs Through Graph Learning Based on Multimodal Information

TL;DR

This paper tackles fragmented layer grouping in GUI designs by introducing a graph-learning pipeline that fuses multimodal information from design prototypes. It constructs a UI graph from layer inclusions, uses a multi-head attention enhanced GNN to classify layers and regress merging-group bounding boxes, and applies a novel NMS-based box merging strategy to form semantically coherent groups. Through extensive experiments and a user study, the method achieves state-of-the-art performance on real-world datasets and demonstrates improved code readability and maintainability for downstream GUI-to-code tools. The work advances GUI understanding by leveraging multimodal cues and graph-structured reasoning to robustly group fragmented layers across diverse designs, with practical impact on automated front-end code generation.

Abstract

Automatically constructing GUI groups of different granularities constitutes a critical intelligent step towards automating GUI design and implementation tasks. Specifically, in the industrial GUI-to-code process, fragmented layers may decrease the readability and maintainability of generated code, which can be alleviated by grouping semantically consistent fragmented layers in the design prototypes. This study aims to propose a graph-learning-based approach to tackle the fragmented layer grouping problem according to multi-modal information in design prototypes. Our graph learning module consists of self-attention and graph neural network modules. By taking the multimodal fused representation of GUI layers as input, we innovatively group fragmented layers by classifying GUI layers and regressing the bounding boxes of the corresponding GUI components simultaneously. Experiments on two real-world datasets demonstrate that our model achieves state-of-the-art performance. A further user study is also conducted to validate that our approach can assist an intelligent downstream tool in generating more maintainable and readable front-end code.

Paper Structure

This paper contains 37 sections, 20 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of Task Introduction and Challenge Specifications.The design prototype contains a layer tree with a view hierarchy structure. UI layers are organized within this hierarchy, where layers of different components may overlap due to the aesthetic style of the design . By grouping fragmented layers, we can reorganize the GUI layers into a clean view hierarchy ( as shown by ①②③④ in the figure). The four image blocks on the right visualize how each merged UI component is organized through layers.
  • Figure 2: Overview of the proposed method.By accessing the view hierarchy of the design prototype, we construct a UI graph based on the geometric relationships between layers. We propose a graph learning block to capture the semantic associations and spatial structure between layers. We design a layer classification branch (cls) and a bounding box regression branch (loc), which are used to classify layers and regress the bounding boxes of merging groups, respectively.
  • Figure 3: The process of fragmented layer grouping.To group fragmented layers, we first localize merging groups and classify layers to determine if they are fragmented layers. Then, within the bounding boxes of the detected merging groups, we group the fragmented layers that need to be merged. We use solid red lines to represent the predicted bounding boxes of merging groups and dashed red lines to represent the fragmented layers.
  • Figure 4: The workflow of UI graph constructing.
  • Figure 5: Details of graph learning blocks:Inspired by rampavsek2022recipe , we introduce a multi-head attention module to our graph learning blocks to break through the fundamental limitations of GNNs.
  • ...and 2 more figures