Table of Contents
Fetching ...

How Low Can You Go? Searching for the Intrinsic Dimensionality of Complex Networks using Metric Node Embeddings

Nikolaos Nakis, Niels Raunkjær Holm, Andreas Lyhne Fiehn, Morten Mørup

TL;DR

The paper investigates how few dimensions suffice to exactly reconstruct complex networks, showing that Euclidean metric embeddings using the latent distance model can match or beat LPCA in embedding efficiency. It introduces a binary-search procedure to bound the exact embedding dimension D^*, and a KD-tree–based, linearithmic reconstruction check to scale to large graphs, complemented by a hierarchical block distance model (HBDM) for scalable initialization. Theoretical result D^*_{LPCA}-2 ≤ D^*_{L2} ≤ D^*_{LPCA} and extensive experiments on datasets from small to million-node graphs demonstrate substantially lower embedding dimensions than prior bounds, including successful exact reconstructions for large networks. These findings enable highly compact, lossless graph representations with broad implications for visualization, community detection, node classification, and link prediction, while offering scalable and reproducible methodology."

Abstract

Low-dimensional embeddings are essential for machine learning tasks involving graphs, such as node classification, link prediction, community detection, network visualization, and network compression. Although recent studies have identified exact low-dimensional embeddings, the limits of the required embedding dimensions remain unclear. We presently prove that lower dimensional embeddings are possible when using Euclidean metric embeddings as opposed to vector-based Logistic PCA (LPCA) embeddings. In particular, we provide an efficient logarithmic search procedure for identifying the exact embedding dimension and demonstrate how metric embeddings enable inference of the exact embedding dimensions of large-scale networks by exploiting that the metric properties can be used to provide linearithmic scaling. Empirically, we show that our approach extracts substantially lower dimensional representations of networks than previously reported for small-sized networks. For the first time, we demonstrate that even large-scale networks can be effectively embedded in very low-dimensional spaces, and provide examples of scalable, exact reconstruction for graphs with up to a million nodes. Our approach highlights that the intrinsic dimensionality of networks is substantially lower than previously reported and provides a computationally efficient assessment of the exact embedding dimension also of large-scale networks. The surprisingly low dimensional representations achieved demonstrate that networks in general can be losslessly represented using very low dimensional feature spaces, which can be used to guide existing network analysis tasks from community detection and node classification to structure revealing exact network visualizations.

How Low Can You Go? Searching for the Intrinsic Dimensionality of Complex Networks using Metric Node Embeddings

TL;DR

The paper investigates how few dimensions suffice to exactly reconstruct complex networks, showing that Euclidean metric embeddings using the latent distance model can match or beat LPCA in embedding efficiency. It introduces a binary-search procedure to bound the exact embedding dimension D^*, and a KD-tree–based, linearithmic reconstruction check to scale to large graphs, complemented by a hierarchical block distance model (HBDM) for scalable initialization. Theoretical result D^*_{LPCA}-2 ≤ D^*_{L2} ≤ D^*_{LPCA} and extensive experiments on datasets from small to million-node graphs demonstrate substantially lower embedding dimensions than prior bounds, including successful exact reconstructions for large networks. These findings enable highly compact, lossless graph representations with broad implications for visualization, community detection, node classification, and link prediction, while offering scalable and reproducible methodology."

Abstract

Low-dimensional embeddings are essential for machine learning tasks involving graphs, such as node classification, link prediction, community detection, network visualization, and network compression. Although recent studies have identified exact low-dimensional embeddings, the limits of the required embedding dimensions remain unclear. We presently prove that lower dimensional embeddings are possible when using Euclidean metric embeddings as opposed to vector-based Logistic PCA (LPCA) embeddings. In particular, we provide an efficient logarithmic search procedure for identifying the exact embedding dimension and demonstrate how metric embeddings enable inference of the exact embedding dimensions of large-scale networks by exploiting that the metric properties can be used to provide linearithmic scaling. Empirically, we show that our approach extracts substantially lower dimensional representations of networks than previously reported for small-sized networks. For the first time, we demonstrate that even large-scale networks can be effectively embedded in very low-dimensional spaces, and provide examples of scalable, exact reconstruction for graphs with up to a million nodes. Our approach highlights that the intrinsic dimensionality of networks is substantially lower than previously reported and provides a computationally efficient assessment of the exact embedding dimension also of large-scale networks. The surprisingly low dimensional representations achieved demonstrate that networks in general can be losslessly represented using very low dimensional feature spaces, which can be used to guide existing network analysis tasks from community detection and node classification to structure revealing exact network visualizations.

Paper Structure

This paper contains 19 sections, 2 theorems, 16 equations, 15 figures, 8 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $D^*_{LPCA}$ and $D^*_{L_2}$ denote the lowest exact embedding dimension for a graph embedding obtainable by optimization w.r.t. the $\mathcal{R}_{LPCA}$-reconstruction and $\mathcal{R}_{L_2}$-reconstruction respectively. We then have the relationship

Figures (15)

  • Figure 1: Model Overview: The input network is embedded into a low-dimensional space using matrices $\mathbf{X}$ and $\mathbf{Y}$, defining an upper bound $D^*$ on intrinsic dimensionality for structure-preserving reconstruction via the $\beta$-radius. Connected nodes fall within each other's $\beta$-radius, ensuring exact reconstruction.
  • Figure 1: Graphs used in the experiments along with some statistics. Comp. Arboricity denotes the maximal arboricity obtained considering as subgraphs only the connected components of the graph as the actual arboricity requires infeasible exhaustive evaluation of all combinations of subsets of nodes thus providing a lower bound on the arboricity.
  • Figure 2: Example graphs with a community and an anticommunity structure, respectively, and their corresponding $\mathcal{R}_{LPCA}$- and $\mathcal{R}_{L2}$-embeddings (Lines/Circles denote link thresholds for $\mathcal{R}_{L2}$).
  • Figure 2: Lowest exact embedding dimensions ($D^*$) found for each dataset across 5 searches along with the mean and standard deviations across the searches. We have marked directed networks with a "*" as these will not be comparable with ChanpuriyaNodeNetworks as they converted all networks to undirected networks.
  • Figure 3: Visualization of the training statistics over 100 test runs on the synthetic block graph seen in the left figure. The bar is the mean exact embedding dimension (EED) and the error bars correspond to the standard deviation of the measurements. An extended version of this figure can be seen in the supplementary \ref{['A:viz-of-synth']}.
  • ...and 10 more figures

Theorems & Definitions (3)

  • Theorem 2.1
  • Theorem A.1
  • proof