Table of Contents
Fetching ...

Mapping bipartite networks into multidimensional hyperbolic spaces

Robert Jankowski, Roya Aliakbarisani, M. Ángeles Serrano, Marián Boguñá

TL;DR

The paper tackles projection-induced distortions in bipartite networks by introducing a model-based, multidimensional hyperbolic embedding where both node types share a common similarity space. The bipartite-$\mathbb{S}^D/\mathbb{H}^{D+1}$ model uses a gravity-like connection probability that decays with distance, with embedding inferred by the B-Mercator algorithm, including hidden degrees and the inverse temperature $\beta_b$. Validation on synthetic data shows accurate coordinate recovery and parameter inference, while real-world networks (Unicodelang, Metabolic, Flavor) yield embeddings that preserve topology and reveal interpretable structures; B-Mercator also boosts performance on graph ML tasks like node classification and link prediction, and enables generating realistic synthetic data for secure sharing. Overall, the approach provides a principled, geometry-based framework for uncovering hidden structure in bipartite systems and offers practical benefits for downstream analysis and data security.

Abstract

Bipartite networks appear in many real-world contexts, linking entities across two distinct sets. They are often analyzed via one-mode projections, but such projections can introduce artificial correlations and inflated clustering, obscuring the true underlying structure. In this paper, we propose a geometric model for bipartite networks that leverages the high levels of bipartite four-cycles as a measure of clustering to place both node types in the same similarity space, where link probabilities decrease with distance. Additionally, we introduce B-Mercator, an algorithm that infers node positions from the bipartite structure. We evaluate its performance on diverse datasets, illustrating how the resulting embeddings improve downstream tasks such as node classification and distance-based link prediction in machine learning. These hyperbolic embeddings also enable the generation of synthetic networks with node features closely resembling real-world ones, thereby safeguarding sensitive information while allowing secure data sharing. In addition, we show how preserving bipartite structure avoids the pitfalls of projection-based techniques, yielding more accurate descriptions and better performance. Our method provides a robust framework for uncovering hidden geometry in complex bipartite systems.

Mapping bipartite networks into multidimensional hyperbolic spaces

TL;DR

The paper tackles projection-induced distortions in bipartite networks by introducing a model-based, multidimensional hyperbolic embedding where both node types share a common similarity space. The bipartite- model uses a gravity-like connection probability that decays with distance, with embedding inferred by the B-Mercator algorithm, including hidden degrees and the inverse temperature . Validation on synthetic data shows accurate coordinate recovery and parameter inference, while real-world networks (Unicodelang, Metabolic, Flavor) yield embeddings that preserve topology and reveal interpretable structures; B-Mercator also boosts performance on graph ML tasks like node classification and link prediction, and enables generating realistic synthetic data for secure sharing. Overall, the approach provides a principled, geometry-based framework for uncovering hidden structure in bipartite systems and offers practical benefits for downstream analysis and data security.

Abstract

Bipartite networks appear in many real-world contexts, linking entities across two distinct sets. They are often analyzed via one-mode projections, but such projections can introduce artificial correlations and inflated clustering, obscuring the true underlying structure. In this paper, we propose a geometric model for bipartite networks that leverages the high levels of bipartite four-cycles as a measure of clustering to place both node types in the same similarity space, where link probabilities decrease with distance. Additionally, we introduce B-Mercator, an algorithm that infers node positions from the bipartite structure. We evaluate its performance on diverse datasets, illustrating how the resulting embeddings improve downstream tasks such as node classification and distance-based link prediction in machine learning. These hyperbolic embeddings also enable the generation of synthetic networks with node features closely resembling real-world ones, thereby safeguarding sensitive information while allowing secure data sharing. In addition, we show how preserving bipartite structure avoids the pitfalls of projection-based techniques, yielding more accurate descriptions and better performance. Our method provides a robust framework for uncovering hidden geometry in complex bipartite systems.

Paper Structure

This paper contains 15 sections, 9 equations, 6 figures.

Figures (6)

  • Figure 1: Schematic representation of the bipartite-$\mathbb{S}^D$ model in dimension (a) $D=1$ and (b) $D=2$. Nodes A are shown as circles whereas nodes B are shown as squares whose sizes are proportional to the nodes' expected degrees. The angular distances between nodes A and B are highlighted ($\Delta\theta_{A_1, B_1}$ and $\Delta\theta_{A_2, B_2}$). Light grey lines represent the edges in the bipartite network generated by Eq. \ref{['eq:prob']}.
  • Figure 2: Validation of B-Mercator on synthetic bipartite networks. Relationship between the original and the inferred coordinates of the (a) bipartite-$\mathbb{S}^1$ and (b) bipartite-$\mathbb{S}^2$ models. In the top left corner of each figure, we report the value of the Spearman correlation coefficient between the inferred and original coordinates. Since the inferred coordinates might be rotated, we transform them to minimize the average angular distance between the original and inferred coordinates (Supplementary Section IV in jankowski2023dmercator). With parameters: $N_A=500$ (number of type-A nodes), $N_B=1000$ (number of type-B nodes), $\gamma_A=2.7$ (exponent of the powerlaw degree distribution of type-A nodes), $\gamma_B=2.1$ (exponent of the powerlaw degree distribution of type-B nodes), $\langle k_A \rangle=10$ (average degree of type-A nodes), $\beta_b = 1.5D$ (inverse temperature), with dimension $D=1$ for (a) and $D=2$ for (b).
  • Figure 3: Bipartite greedy routing in synthetic networks. (a) Schematic view of the greedy routing protocol. We select a type-A node as an origin ($\mathbf{S_A}$) and a type-B ($\mathbf{T_B}$) as a destination. The red arrows show how the message is forwarded towards the destination. In the second example, we select two type-B nodes as source ($\mathbf{S_{B^\prime}}$) and destination ($\mathbf{T_{B^{\prime\prime}}}$) and outline the greedy path with purple color. The line width is proportional to the connection probability (Eq. \ref{['eq:prob']}). (b) Success rate as a function of embedded dimension for networks generated in $D=\{1,2,3,4\}$. We consider here a navigation protocol where a source is a type-A node and a destination a type-B one. Results are obtained by averaging over 10 realizations with parameters: $N_A=500$ (number of type-A nodes), $N_B=500$ (number of type-B nodes), $\gamma_A=2.5$ (exponent of the powerlaw degree distribution of type-A nodes), $\gamma_B=2.5$ (exponent of the powerlaw degree distribution of type-B nodes), $\langle k_A \rangle=10$ (average degree of type-A nodes), $\beta_b = 1.5D$ (inverse temperature). The box ranges from the first quartile to the third quartile. A horizontal line goes through the box at the median. The whiskers go from each quartile to the minimum or maximum.
  • Figure 4: Visualization of the bipartite-$\mathbb{S}^1$ embedding of the Unicodelang dataset per country or language. Panels (a, b, c) show countries where a given language is spoken, i.e., the neighbors of the language node. The size of the nodes is proportional to the number of language speakers in that country. The color corresponds to the geographical region in which the country is located. A star marker indicates the position of a given language. In panels (d, e, f), we depict all languages spoken in a given country, i.e., the neighbors of the country node. The size of the nodes is proportional to the fraction of speakers of a given language. The color represents that language's script. A cross marker indicates the position of a given country.
  • Figure 5: Language or country diversity for the top 15 highest-degree nodes. Panel (a) shows violin plots of angular distances between each language and its neighboring countries, with colors indicating script type. Panel (b) presents analogous plots for countries, with colors representing geographic region. In both panels, nodes are ordered in descending order of degree, with each node’s degree (in brackets) indicated next to its label. Only the central 95% of the data is plotted--that is, data between the 2.5th and 97.5th quantiles are shown. A black line highlights the median value in each plot.
  • ...and 1 more figures