Structure and inference in hypergraphs with node attributes

Anna Badalyan; Nicolò Ruggeri; Caterina De Bacco

Structure and inference in hypergraphs with node attributes

Anna Badalyan, Nicolò Ruggeri, Caterina De Bacco

TL;DR

This study presents a model that integrates higher-order interactions and node attributes for improved community detection in hypergraphs, demonstrating superior accuracy and efficiency in hyperedge prediction and community division tasks.

Abstract

Many networked datasets with units interacting in groups of two or more, encoded with hypergraphs, are accompanied by extra information about nodes, such as the role of an individual in a workplace. Here we show how these node attributes can be used to improve our understanding of the structure resulting from higher-order interactions. We consider the problem of community detection in hypergraphs and develop a principled model that combines higher-order interactions and node attributes to better represent the observed interactions and to detect communities more accurately than using either of these types of information alone. The method learns automatically from the input data the extent to which structure and attributes contribute to explain the data, down weighing or discarding attributes if not informative. Our algorithmic implementation is efficient and scales to large hypergraphs and interactions of large numbers of units. We apply our method to a variety of systems, showing strong performance in hyperedge prediction tasks and in selecting community divisions that correlate with attributes when these are informative, but discarding them otherwise. Our approach illustrates the advantage of using informative node attributes when available with higher-order data.

Structure and inference in hypergraphs with node attributes

TL;DR

Abstract

Paper Structure (5 sections, 32 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 5 sections, 32 equations, 9 figures, 5 tables, 1 algorithm.

Synthetic data generation
Solving for the membership matrix updates
Alternative formulation for excluding attributes
The advantages of using a hypergraph representation
Additional results of community detection

Figures (9)

Figure 1: Community detection in synthetic hypegraphs. We show the cosine similarity between the communities inferred by the various algorithms and the ground truth communities in synthetic hypergraphs, with $N=500$ and $E = 2720$. We show results for different numbers of communities $K$ (from left to right). The number of attributes $Z$ is selected to be equal to $K$, and the parameter $\gamma$ is set equal to the fraction $\rho$ of unshuffled attributes. We compare HyCoSBM with Hy-MMSBM, which serves as a baseline that only employs structural information. We also measure the cosine similarity of the attribute matrix $X$ and the ground truth membership matrix $u$ Only attributes). Lines and shades around them are averages and standard deviations over $10$ different network realisations.
Figure 2: Predicting interactions in close-proximity datasets with partial observations. We show the performance of various methods in hyperedge prediction tasks, measured by AUC, as we vary the fraction of hyperedges made available to the algorithms. This plot shows that the performance of HyCoSBM remains high when fewer hyperedges are available in input, while that of the algorithms which do not use any attribute drops. Lines and shades around them are averages and standard deviations over $5$ cross-validation folds.
Figure 3: Communities detected in a Workplace dataset from partial observations of close-proximity interactions. We vary the fraction of hyperedges given in input to the algorithms (top: $100\%$, bottom: $50\%$) and compare the inferred communities against the attribute departement (top left). The AUC barplot (bottom-left) shows the performance of the models in hyperedge prediction. Bars and error bars are averages and standard deviations over $5$ cross-validation folds. This plot shows that HyCoSBM is able to use the attributes effectively to keep performance high even at a low fraction of input observations.
Figure 4: AUC on contacts dataset with partial hyperedges: uncorrelated attributes. Using sex and has facebook as the attributes, the performance of all models drops as the hyperedges are removed. Lines and shades around them are averages and standard deviations over $5$ cross-validation folds.
Figure 5: Cosine similarity and AUC in a Gene Disease dataset. A) Cosine similarity between the three types of communities: attribute, HyCoSBM and Hy-MMSBM. B) AUC in predicting missing hyperedges. Bars and error bars are averages and standard deviations over $5$ cross-validation folds. The membership $u$ detected by HyCoSBM correlates with the DPI attribute and achieves higher AUC than both Hy-MMSBM and the model trained with $u$ fixed as the attribute.
...and 4 more figures

Structure and inference in hypergraphs with node attributes

TL;DR

Abstract

Structure and inference in hypergraphs with node attributes

Authors

TL;DR

Abstract

Table of Contents

Figures (9)