Table of Contents
Fetching ...

L2G2G: a Scalable Local-to-Global Network Embedding with Graph Autoencoders

Ruikang Ouyang, Andrew Elliott, Stratis Limnios, Mihai Cucuringu, Gesine Reinert

TL;DR

The paper addresses the scalability–accuracy trade-off in graph autoencoders by introducing L2G2G, a Local2Global-based method that synchronizes patch-level embeddings during GAE training and uses a patch-wise, transformed decoder to enhance global consistency. It provides a theoretical complexity analysis and empirical evidence on real and synthetic datasets showing that L2G2G generally outperforms FastGAE and GAE+L2G, while delivering comparable or superior accuracy to full GAEs on large or dense graphs, with only modest training-time overhead. The approach leverages local subgraph processing (patches) combined with ongoing alignment to preserve global structure, enabling efficient training on large networks. The authors also contribute a practical discussion of when to use patch-based strategies and outline directions for accelerating the Local2Global step and extending the method to more heterogeneous graphs.

Abstract

For analysing real-world networks, graph representation learning is a popular tool. These methods, such as a graph autoencoder (GAE), typically rely on low-dimensional representations, also called embeddings, which are obtained through minimising a loss function; these embeddings are used with a decoder for downstream tasks such as node classification and edge prediction. While GAEs tend to be fairly accurate, they suffer from scalability issues. For improved speed, a Local2Global approach, which combines graph patch embeddings based on eigenvector synchronisation, was shown to be fast and achieve good accuracy. Here we propose L2G2G, a Local2Global method which improves GAE accuracy without sacrificing scalability. This improvement is achieved by dynamically synchronising the latent node representations, while training the GAEs. It also benefits from the decoder computing an only local patch loss. Hence, aligning the local embeddings in each epoch utilises more information from the graph than a single post-training alignment does, while maintaining scalability. We illustrate on synthetic benchmarks, as well as real-world examples, that L2G2G achieves higher accuracy than the standard Local2Global approach and scales efficiently on the larger data sets. We find that for large and dense networks, it even outperforms the slow, but assumed more accurate, GAEs.

L2G2G: a Scalable Local-to-Global Network Embedding with Graph Autoencoders

TL;DR

The paper addresses the scalability–accuracy trade-off in graph autoencoders by introducing L2G2G, a Local2Global-based method that synchronizes patch-level embeddings during GAE training and uses a patch-wise, transformed decoder to enhance global consistency. It provides a theoretical complexity analysis and empirical evidence on real and synthetic datasets showing that L2G2G generally outperforms FastGAE and GAE+L2G, while delivering comparable or superior accuracy to full GAEs on large or dense graphs, with only modest training-time overhead. The approach leverages local subgraph processing (patches) combined with ongoing alignment to preserve global structure, enabling efficient training on large networks. The authors also contribute a practical discussion of when to use patch-based strategies and outline directions for accelerating the Local2Global step and extending the method to more heterogeneous graphs.

Abstract

For analysing real-world networks, graph representation learning is a popular tool. These methods, such as a graph autoencoder (GAE), typically rely on low-dimensional representations, also called embeddings, which are obtained through minimising a loss function; these embeddings are used with a decoder for downstream tasks such as node classification and edge prediction. While GAEs tend to be fairly accurate, they suffer from scalability issues. For improved speed, a Local2Global approach, which combines graph patch embeddings based on eigenvector synchronisation, was shown to be fast and achieve good accuracy. Here we propose L2G2G, a Local2Global method which improves GAE accuracy without sacrificing scalability. This improvement is achieved by dynamically synchronising the latent node representations, while training the GAEs. It also benefits from the decoder computing an only local patch loss. Hence, aligning the local embeddings in each epoch utilises more information from the graph than a single post-training alignment does, while maintaining scalability. We illustrate on synthetic benchmarks, as well as real-world examples, that L2G2G achieves higher accuracy than the standard Local2Global approach and scales efficiently on the larger data sets. We find that for large and dense networks, it even outperforms the slow, but assumed more accurate, GAEs.
Paper Structure (12 sections, 1 equation, 4 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 1 equation, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: L2G2L pipeline for two patches. The two patches are in blue and yellow, the overlapping nodes between them in green. Separate node embeddings for each patch are obtained via a single GCN. The decoder aligns the embeddings using the Local2Global synchronisation algorithm to yield a global embedding and then uses a standard sigmoid function. The GCN is then iteratively optimised using the training loss.
  • Figure 2: Training time of the baseline models(GAE, FastGAE and GAE+L2G) and L2G2G on benchmark data sets (excluding partitioning time). Note that the y-axis is on a log-scale, and thus the faster methods are at least an order of magnitude faster.
  • Figure 3: Lineplots of the ROC score and accuracy of L2G2G and GAE+L2G, trained on each dataset, with different patch sizes. For each subplot, the blue lines represent the metrics for L2G2G, while the orange ones represent those for GAE+L2G. The shadows in each subplot indicate the standard deviations of each metric.
  • Figure 4: Training time (excluding partitioning) of L2G2G and GAE+L2G on Cora ( Top) and Yelp ( Bottom), while varying patch size with CPU results presented on the left and GPU results presented on the right. The x axis is shown in log scale.