From Graphs to Hypergraphs: Hypergraph Projection and its Remediation
Yanbang Wang, Jon Kleinberg
TL;DR
This work addresses the loss of higher-order information when replacing hypergraphs with their graph projections. It provides a theoretical analysis that identifies two common loss-inducing patterns and proves that exact recovery from projections is generically impossible without extra information. To remediate, the authors propose SHyRe, a learning-based framework that leverages a domain-specific training hypergraph to reconstruct higher-order structures from projections, using a ρ(n,k)-alignment statistic, a budgeted clique sampler, and a feature-rich hyperedge classifier. Empirically, SHyRe substantially outperforms baselines across eight real-world datasets, and enables improved downstream tasks such as protein ranking, link prediction, and clustering, illustrating the practical value of reconstructed hypergraphs as informative intermediate representations.
Abstract
We study the implications of the modeling choice to use a graph, instead of a hypergraph, to represent real-world interconnected systems whose constituent relationships are of higher order by nature. Such a modeling choice typically involves an underlying projection process that maps the original hypergraph onto a graph, and is common in graph-based analysis. While hypergraph projection can potentially lead to loss of higher-order relations, there exists very limited studies on the consequences of doing so, as well as its remediation. This work fills this gap by doing two things: (1) we develop analysis based on graph and set theory, showing two ubiquitous patterns of hyperedges that are root to structural information loss in all hypergraph projections; we also quantify the combinatorial impossibility of recovering the lost higher-order structures if no extra help is provided; (2) we still seek to recover the lost higher-order structures in hypergraph projection, and in light of (1)'s findings we propose to relax the problem into a learning-based setting. Under this setting, we develop a learning-based hypergraph reconstruction method based on an important statistic of hyperedge distributions that we find. Our reconstruction method is evaluated on 8 real-world datasets under different settings, and exhibits consistently good performance. We also demonstrate benefits of the reconstructed hypergraphs via use cases of protein rankings and link predictions.
