Table of Contents
Fetching ...

Feature-aware ultra-low dimensional reduction of real networks

Robert Jankowski, Pegah Hozhabrierdi, Marián Boguñá, M. Ángeles Serrano

TL;DR

FiD-Mercator tackles the integration of node features with network topology to produce ultra-low dimensional hyperbolic embeddings. It combines the $S^2$ model with a feature-informed $D=2$ embedding by initializing coordinates via UMAP on node features and refining them through likelihood-based optimization, achieving a joint topology-feature representation. The key finding is that downstream tasks such as link prediction and node classification improve when feature-topology correlation is high, while preserving the local properties captured by hyperbolic embeddings. This work points to a principled path for joint topology-feature embedding methods that can adapt to the relevance of metadata and enhance robustness across diverse real networks.

Abstract

In existing models and embedding methods of networked systems, node features describing their qualities are usually overlooked in favor of focusing solely on node connectivity. This study introduces $FiD$-Mercator, a model-based ultra-low dimensional reduction technique that integrates node features with network structure to create $D$-dimensional maps of complex networks in a hyperbolic space. This embedding method efficiently uses features as an initial condition, guiding the search of nodes' coordinates towards an optimal solution. The research reveals that downstream task performance improves with the correlation between network connectivity and features, emphasizing the importance of such correlation for enhancing the description and predictability of real networks. Simultaneously, hyperbolic embedding's ability to reproduce local network properties remains unaffected by the inclusion of features. The findings highlight the necessity for developing network embedding techniques capable of exploiting such correlations to optimize both network structure and feature association jointly in the future.

Feature-aware ultra-low dimensional reduction of real networks

TL;DR

FiD-Mercator tackles the integration of node features with network topology to produce ultra-low dimensional hyperbolic embeddings. It combines the model with a feature-informed embedding by initializing coordinates via UMAP on node features and refining them through likelihood-based optimization, achieving a joint topology-feature representation. The key finding is that downstream tasks such as link prediction and node classification improve when feature-topology correlation is high, while preserving the local properties captured by hyperbolic embeddings. This work points to a principled path for joint topology-feature embedding methods that can adapt to the relevance of metadata and enhance robustness across diverse real networks.

Abstract

In existing models and embedding methods of networked systems, node features describing their qualities are usually overlooked in favor of focusing solely on node connectivity. This study introduces -Mercator, a model-based ultra-low dimensional reduction technique that integrates node features with network structure to create -dimensional maps of complex networks in a hyperbolic space. This embedding method efficiently uses features as an initial condition, guiding the search of nodes' coordinates towards an optimal solution. The research reveals that downstream task performance improves with the correlation between network connectivity and features, emphasizing the importance of such correlation for enhancing the description and predictability of real networks. Simultaneously, hyperbolic embedding's ability to reproduce local network properties remains unaffected by the inclusion of features. The findings highlight the necessity for developing network embedding techniques capable of exploiting such correlations to optimize both network structure and feature association jointly in the future.
Paper Structure (13 sections, 6 equations, 7 figures, 1 table)

This paper contains 13 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Schematic view on the proposed method $FiD$-Mercator. First, from the network we infer the hidden degrees $\kappa$ and parameter $\beta$. Second, we use the UMAP algorithm to map the nodes onto the two-sphere using the feature matrix. Lastly, the initial nodes' coordinates from the last step are used in the maximization likelihood procedure which tries to fit data to the $\mathbb{S}^2$ model. The size of the nodes is proportional to its expected degree and they are colored according to their communities. Black lines on the two-sphere represent connections produced according to the model.
  • Figure 2: Two dimensional maps of the LastFM network. Each row corresponds to the different embedding methods whereas each column is for a different assignment of labels. The size of a node is proportional to its expected degree, and its color indicates the community it belongs to. For the sake of clarity, only the connections with probability (Eq. \ref{['eq:prob_conn']}) $p_{ij} > 0.999$ are shown.
  • Figure 3: (a,b) Evolution of the global log-likelihood during the maximum likelihood (ML) optimization steps. Each ML step takes a subset of nodes for which new coordinates are proposed. The nodes are ordered through the onion decomposition hebert2016multi. Notice that we plot the negative log-likelihood here, hence the lower the value the better. (c,d) The average angular distance between each node of the initial embedding (for $D$-Mercator: Laplacian Eigenmaps; for $FiD$-Mercator: UMAP) and the embedding after each ML step. We computed the average angular distance only for nodes with $k>2$.
  • Figure 4: Validation of the embeddings of the Amazon Photo network. Panel (a) shows the complementary cumulative degree distribution and panel (b) the clustering spectrum $\bar{c}(k)$. Symbols correspond to the values in the original network. The lines indicate an estimate of the expected values in the ensemble of random networks in the different embeddings. The $\mathbb{S}^2$ model was used to generate 100 synthetic networks with the parameters and positions inferred by $FiD$-Mercator, $D$-Mercator, or UMAP. The error bars show the $2\sigma$ confidence interval around the expected value. Panel (c) shows the average nearest neighbors degree $\bar{k}_{nn}(k)$. Panel (d) displays the comparison of the expected connection probability based on the estimated $\beta$ (expected) and the actual connection probability computed with the inferred hidden variables.
  • Figure 5: Precision as a function of the fraction of missing links in a link prediction task for (a,b) Amazon Photo, (c,d) DBLP, and (e,f) Twitch PTBR networks. Panels (a,c,e) show the results for the scheme in which the embeddings are of the complete network, whereas panels (b,d,f) are for the second scheme embedding the GCC of the incomplete network. Results are averaged over $5$ different realizations.
  • ...and 2 more figures