Perfect Recovery for Random Geometric Graph Matching with Shallow Graph Neural Networks
Suqi Liu, Morgane Austern
TL;DR
This work analyzes graph alignment when two graphs are noisy, edge-subsampled copies of a random geometric graph with sparse binary features. It introduces a two-layer Graph Neural Network with a simple thresholding scheme to produce embeddings, then solves a Hungarian assignment to recover the true vertex permutation. The authors prove that perfect recovery is achievable with high probability under a parameter regime where $\min\{s, \frac{qm}{s}, \frac{qm}{\sigma^2 s^2}\} \gg \log n + \log d$, and show that the noise bound is tight up to log factors; they also show that a direct feature-only matching can fail in regimes where the GNN succeeds. Empirical results on synthetic data and real datasets (Cora, CiteSeer) corroborate the theory, highlighting that GNN-based alignment leverages both graph structure and noisy features to outperform linear methods, especially as noise grows with graph size. The study provides theoretical grounding for the effectiveness of shallow GNNs in graph alignment and illuminates the bias-variance trade-off inherent in aggregating neighbor information on geometric graphs.
Abstract
We study the graph matching problem in the presence of vertex feature information using shallow graph neural networks. Specifically, given two graphs that are independent perturbations of a single random geometric graph with sparse binary features, the task is to recover an unknown one-to-one mapping between the vertices of the two graphs. We show under certain conditions on the sparsity and noise level of the feature vectors, a carefully designed two-layer graph neural network can, with high probability, recover the correct mapping between the vertices with the help of the graph structure. Additionally, we prove that our condition on the noise parameter is tight up to logarithmic factors. Finally, we compare the performance of the graph neural network to directly solving an assignment problem using the noisy vertex features and demonstrate that when the noise level is at least constant, this direct matching fails to achieve perfect recovery, whereas the graph neural network can tolerate noise levels growing as fast as a power of the size of the graph. Our theoretical findings are further supported by numerical studies as well as real-world data experiments.
