GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks
Jihee You, So Won Jeong, Claire Donnat
TL;DR
The paper addresses the challenge of evaluating and deploying unsupervised GNN embeddings for dimensionality reduction by benchmarking existing methods and introducing GNUMAP, a parameter-free Graph-Neural UMAP that extends UMAP to graphs via an autoencoder framework. GNUMAP defines high-dimensional connectivity from the adjacency, learns low-dimensional node representations with a GNN, and optimizes a cross-entropy loss between high- and low-dimensional connectivities using $q_{ij} = \frac{1}{1 + \alpha d(y_i,y_j)^{2\beta}}$ with defaults $\alpha=1.57$ and $\beta=0.89$, coupled with batch whitening for stability. Through extensive experiments on synthetic manifolds and real-world networks, it demonstrates strong performance and robustness relative to state-of-the-art GNN-based methods and classical DR techniques, with the advantage of being largely hyperparameter-free. The work highlights GNUMAP's practical impact for dimensionality reduction in domains like biology, while outlining limitations to homophilic graphs and opportunities for density-aware extensions.
Abstract
With the proliferation of Graph Neural Network (GNN) methods stemming from contrastive learning, unsupervised node representation learning for graph data is rapidly gaining traction across various fields, from biology to molecular dynamics, where it is often used as a dimensionality reduction tool. However, there remains a significant gap in understanding the quality of the low-dimensional node representations these methods produce, particularly beyond well-curated academic datasets. To address this gap, we propose here the first comprehensive benchmarking of various unsupervised node embedding techniques tailored for dimensionality reduction, encompassing a range of manifold learning tasks, along with various performance metrics. We emphasize the sensitivity of current methods to hyperparameter choices -- highlighting a fundamental issue as to their applicability in real-world settings where there is no established methodology for rigorous hyperparameter selection. Addressing this issue, we introduce GNUMAP, a robust and parameter-free method for unsupervised node representation learning that merges the traditional UMAP approach with the expressivity of the GNN framework. We show that GNUMAP consistently outperforms existing state-of-the-art GNN embedding methods in a variety of contexts, including synthetic geometric datasets, citation networks, and real-world biomedical data -- making it a simple but reliable dimensionality reduction tool.
