Table of Contents
Fetching ...

Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Yue Jiang, Changkong Zhou, Vikas Garg, Antti Oulasvirta

TL;DR

Graph4GUI introduces a graph-based GUI representation that jointly models GUI element properties and layout constraints within a heterogeneous bipartite graph. A Graph Neural Network operates on this graph to predict element dimensions and positions for autocompletion, yielding more aligned and visually appealing results than baselines, and enabling a Figma plug-in for designer-ready workflow. The approach is validated through autocompletion experiments on mobile GUI data, a comparison study against GRIDS with human participants, and a designer study showing usability and efficiency gains, alongside additional applications in GUI topic classification and retrieval. The work demonstrates practical benefits for computational design and suggests future work on richer constraints and handling more complex layouts, while acknowledging limitations such as semantic cross-element correspondences and view-hierarchy representation.

Abstract

Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in.

Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

TL;DR

Graph4GUI introduces a graph-based GUI representation that jointly models GUI element properties and layout constraints within a heterogeneous bipartite graph. A Graph Neural Network operates on this graph to predict element dimensions and positions for autocompletion, yielding more aligned and visually appealing results than baselines, and enabling a Figma plug-in for designer-ready workflow. The approach is validated through autocompletion experiments on mobile GUI data, a comparison study against GRIDS with human participants, and a designer study showing usability and efficiency gains, alongside additional applications in GUI topic classification and retrieval. The work demonstrates practical benefits for computational design and suggests future work on richer constraints and handling more complex layouts, while acknowledging limitations such as semantic cross-element correspondences and view-hierarchy representation.

Abstract

Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in.
Paper Structure (79 sections, 11 equations, 8 figures, 2 tables)

This paper contains 79 sections, 11 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: a) Graph4GUI represents each GUI element through a separate node with properties. GUI element nodes convey the element properties, including visual appearance, textual content, element type, position, and size. b) Constraint nodes express four types of constraints: alignment, same-size, element grouping, and multimodal grouping constraints.
  • Figure 2: Graph4GUI was adapted for the autocompletion task: We first encode the graph representation of the GUI via the GNN. We only illustrate some parts of the graph for simplicity. Element 8 is the target to-be-placed element. In each GNN layer, nodes perform aggregation from their respective neighbors. To illustrate, consider element node 3. As it goes through the GNN layers, it accumulates information from related constraint nodes and other element nodes. This process results in feature embedding vectors for all nodes, including both element nodes and constraint nodes within the graph. We compute the graph embedding as a weighted average of the node embeddings with the weight matrix $W$. We then concatenate the target element's embedding vector, the graph embedding, and a constraint embedding and send it to fully connected layers to predict whether the target to-be-placed element should satisfy the constraint. Simultaneously, we concatenate the target element's embedding and the graph embedding to predict the initial position and size of the target element. Integrating these predictions with the constraints, we subsequently refine the position and size to obtain the final results.
  • Figure 3: a) Our model can iteratively predict unplaced GUI elements (shown in blue bounding boxes). b) Designers can make adjustments (orange), including moving, resizing, or re-selecting GUI elements. c) The model's capability to predict groupings allows for the placement of elements together as a group. d) The model can also predict all the elements simultaneously.
  • Figure 4: Comparison of our model with GRIDS dayama2020grids, an autocompletion approach using integer programming, and the established upper bound for the model's performance exploiting ground-truth constraints to predict positions and sizes. The evaluation used three metrics: position error, size error, and alignment error. This comparison incorporates fivefold cross-validation to assure reliability, with the mean and standard deviation illustrated in the corresponding plots.
  • Figure 5: Results from an ablation study comparing our model's performance to ablated models in which each type of constraint has been removed.
  • ...and 3 more figures