Table of Contents
Fetching ...

Global graph features unveiled by unsupervised geometric deep learning

Mirja Granfors, Jesús Pineda, Blanca Zufiria Gerbolés, Joana B. Pereira, Carlo Manzo, Giovanni Volpe

TL;DR

GAUDI (Graph Autoencoder Uncovering Descriptive Information), a novel unsupervised geometric deep learning framework designed to capture both local details and global structure, is introduced, providing new insights into emergent phenomena across diverse scientific domains.

Abstract

Graphs provide a powerful framework for modeling complex systems, but their structural variability poses significant challenges for analysis and classification. To address these challenges, we introduce GAUDI (Graph Autoencoder Uncovering Descriptive Information), a novel unsupervised geometric deep learning framework designed to capture both local details and global structure. GAUDI employs an innovative hourglass architecture with hierarchical pooling and upsampling layers linked through skip connections, which preserve essential connectivity information throughout the encoding-decoding process. Even though identical or highly similar underlying parameters describing a system's state can lead to significant variability in graph realizations, GAUDI consistently maps them into nearby regions of a structured and continuous latent space, effectively disentangling invariant process-level features from stochastic noise. We demonstrate GAUDI's versatility across multiple applications, including small-world networks modeling, characterization of protein assemblies from super-resolution microscopy, analysis of collective motion in the Vicsek model, and identification of age-related changes in brain connectivity. Comparison with related approaches highlights GAUDI's superior performance in analyzing complex graphs, providing new insights into emergent phenomena across diverse scientific domains.

Global graph features unveiled by unsupervised geometric deep learning

TL;DR

GAUDI (Graph Autoencoder Uncovering Descriptive Information), a novel unsupervised geometric deep learning framework designed to capture both local details and global structure, is introduced, providing new insights into emergent phenomena across diverse scientific domains.

Abstract

Graphs provide a powerful framework for modeling complex systems, but their structural variability poses significant challenges for analysis and classification. To address these challenges, we introduce GAUDI (Graph Autoencoder Uncovering Descriptive Information), a novel unsupervised geometric deep learning framework designed to capture both local details and global structure. GAUDI employs an innovative hourglass architecture with hierarchical pooling and upsampling layers linked through skip connections, which preserve essential connectivity information throughout the encoding-decoding process. Even though identical or highly similar underlying parameters describing a system's state can lead to significant variability in graph realizations, GAUDI consistently maps them into nearby regions of a structured and continuous latent space, effectively disentangling invariant process-level features from stochastic noise. We demonstrate GAUDI's versatility across multiple applications, including small-world networks modeling, characterization of protein assemblies from super-resolution microscopy, analysis of collective motion in the Vicsek model, and identification of age-related changes in brain connectivity. Comparison with related approaches highlights GAUDI's superior performance in analyzing complex graphs, providing new insights into emergent phenomena across diverse scientific domains.

Paper Structure

This paper contains 18 sections, 8 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: GAUDI architecture.a, Graphs with the same underlying parameters $p$ can exhibit significant structural variability. Despite this, GAUDI effectively maps graphs originating from the same parameters close together in the latent space, demonstrating its ability to capture underlying similarities. b, GAUDI uses a hierarchical graph-convolutional variational autoencoder architecture, where an encoder progressively compresses the graph into a low-dimensional latent space, and a decoder reconstructs the graph from the latent embedding. At all the levels of compression, adjacency matrices $A^{l_{\text{enc}}}$ and cluster assignment matrices $S^{l_{\text{enc}}}$ are sent directly from the encoder to the decoder to ensure the network embeds the overall structural features. c, Each encoder block comprises a Graph Convolution Layer (GCL) followed by a MinCut Graph Pooling layer (GP). The GCL updates the node features of the graph, while the adjacency matrix remains unchanged. The visual separation is only for clarity, illustrating how the adjacency matrix is sent through skip-connections to the decoder block. The GP reduces the graph's dimensionality while retaining essential topological features. The pooling generates a cluster assignment matrix and an adjacency matrix for the pooled graph. d, The decoder block mirrors the encoder block structure, featuring a Graph Upsampling (GU) layer to reverse the pooling process and a GCL to accurately reconstruct the graph. At each decoder layer, the cluster assignment matrix from the encoder is used for upsampling, and the adjacency matrix is used in the graph convolution of the reconstructed graph.
  • Figure 2: GAUDI representation of Watts-Strogatz small-world graphs in latent space. GAUDI encodes Watts-Strogatz graphs, characterized by node degree $C$ and rewiring probability $p$, into latent variables. a, These graphs model structures ranging from highly ordered to completely random as $p$ increases. As $C$ increases, nodes form more connections, resulting in higher overall connectivity. b, Scatter plot of the first two principal components of the graphs' latent representations, with each point representing a compressed graph. The shape of the scatter points indicates the node degree $C$, where GAUDI effectively clusters graphs based on this parameter. In contrast, the color gradient represents the rewiring probability $p$, showing a smooth transition without distinct clustering, thus illustrating the seamless integration of $p$ variations within the latent space.
  • Figure 3: Latent space representation of protein assemblies by GAUDI. GAUDI encodes protein assemblies from simulated single-molecule localization data, categorized into ring-shaped or covering a spot-like area. a, Four examples of the protein assemblies exhibiting a ring-shaped distribution. The molecular localizations are rendered using a Gaussian convolutional kernel. b, Four examples of the protein assemblies belonging to the group generated to follow a spot-like shape. c, Scatter plot showing the first two principal components of the latent representations for 300 samples. Colors denote the distribution type (ring-shaped or spot-like). The arrows indicate the placement in latent space of the samples shown in a and b. d, The ROC-curve for the classification using a support vector machine on the latent space for all 10,000 samples (AUC=0.94). e Confusion matrix for this classification.
  • Figure 4: GAUDI's latent space representation of self-driven collective behavior. GAUDI encodes self-driven collective dynamics from Vicsek simulations. a, Examples of parts of the simulations with varying flocking radius $R_{\rm f}$ and varying noise level $\eta$, as specified in the plots. The upper two examples have a smaller flocking radius, resulting in a shorter interaction range, while the bottom two examples have a larger flocking radius. The noise level is lower in the examples to the left, and larger to the right. Three consecutive time steps for each particle are shown, with the darkest color visualizing the last time step. b, The first two principal components of the latent space of 200 Vicsek model graphs obtained using GAUDI are shown in the scatter plot. The shape of the scatter points depends on the flocking radius $R_{\rm f}$ of the corresponding sample, where GAUDI effectively clusters graphs based on this parameter. The color corresponds to the noise level $\eta$, showing a smooth transition in the latent space. The arrows indicate the placement in latent space of the samples shown in a.
  • Figure 5: GAUDI's latent space captures dependencies between brain graphs and subject ages. a, Examples of thresholded values of the brain connection graphs of two participants. The graphs are thresholded to only keep the $20\%$ of the connections with highest integrity. The first example is from a 23-year old participant, and the second from a 80-year old. The colors correspond to anatomical groups of brain regions (frontal, parietal, temporal, occipital, subcortical, cerebellar), while the gray-scale indicates the connection integrity. b, Localization of the regions on a brain surface, colored by anatomical group. The illustration is created using BRAPH2 mijalkov_braph_2017chang2025braph. c, The scatter plot shows the first two principal components of the graphs' latent space representations for 400 graphs. The color indicates the age of the participant and the arrows indicate the placement in latent space of the samples shown in a.
  • ...and 1 more figures