Table of Contents
Fetching ...

MARIOH: Multiplicity-Aware Hypergraph Reconstruction

Kyuhan Lee, Geon Lee, Kijung Shin

TL;DR

MARIOH addresses the challenge of recovering the original hypergraph from a projected graph by leveraging edge multiplicity in a supervised framework. It combines theoretically-guaranteed filtering to fix size-2 hyperedges, a multiplicity-aware clique classifier, and a bidirectional search to robustly identify higher-order hyperedges. Across 10 real-world datasets, MARIOH outperforms eight baselines in reconstruction accuracy, transferability, and downstream task performance, while maintaining scalable runtimes. The approach demonstrates practical impact in clustering, classification, and link prediction tasks and offers storage savings by restoring higher-order structure.

Abstract

Hypergraphs offer a powerful framework for modeling higher-order interactions that traditional pairwise graphs cannot fully capture. However, practical constraints often lead to their simplification into projected graphs, resulting in substantial information loss and ambiguity in representing higher-order relationships. In this work, we propose MARIOH, a supervised approach for reconstructing the original hypergraph from its projected graph by leveraging edge multiplicity. To overcome the difficulties posed by the large search space, MARIOH integrates several key ideas: (a) identifying provable size-2 hyperedges, which reduces the candidate search space, (b) predicting the likelihood of candidates being hyperedges by utilizing both structural and multiplicity-related features, and (c) not only targeting promising hyperedge candidates but also examining less confident ones to explore alternative possibilities. Together, these ideas enable MARIOH to efficiently and effectively explore the search space. In our experiments using 10 real-world datasets, MARIOH achieves up to 74.51% higher reconstruction accuracy compared to state-of-the-art methods.

MARIOH: Multiplicity-Aware Hypergraph Reconstruction

TL;DR

MARIOH addresses the challenge of recovering the original hypergraph from a projected graph by leveraging edge multiplicity in a supervised framework. It combines theoretically-guaranteed filtering to fix size-2 hyperedges, a multiplicity-aware clique classifier, and a bidirectional search to robustly identify higher-order hyperedges. Across 10 real-world datasets, MARIOH outperforms eight baselines in reconstruction accuracy, transferability, and downstream task performance, while maintaining scalable runtimes. The approach demonstrates practical impact in clustering, classification, and link prediction tasks and offers storage savings by restoring higher-order structure.

Abstract

Hypergraphs offer a powerful framework for modeling higher-order interactions that traditional pairwise graphs cannot fully capture. However, practical constraints often lead to their simplification into projected graphs, resulting in substantial information loss and ambiguity in representing higher-order relationships. In this work, we propose MARIOH, a supervised approach for reconstructing the original hypergraph from its projected graph by leveraging edge multiplicity. To overcome the difficulties posed by the large search space, MARIOH integrates several key ideas: (a) identifying provable size-2 hyperedges, which reduces the candidate search space, (b) predicting the likelihood of candidates being hyperedges by utilizing both structural and multiplicity-related features, and (c) not only targeting promising hyperedge candidates but also examining less confident ones to explore alternative possibilities. Together, these ideas enable MARIOH to efficiently and effectively explore the search space. In our experiments using 10 real-world datasets, MARIOH achieves up to 74.51% higher reconstruction accuracy compared to state-of-the-art methods.

Paper Structure

This paper contains 19 sections, 4 theorems, 9 equations, 7 figures, 9 tables, 3 algorithms.

Key Result

Lemma 1

The $\text{MHH}(u, v)$ is an upper bound on the total number of higher-order hyperedges that include both $u$ and $v$.

Figures (7)

  • Figure 1: Reduction of candidate space using edge multiplicity. When edge multiplicities are known (top and middle rows), the number of potential outputs is significantly reduced compared to cases with unknown multiplicity (bottom row). The known edge multiplicities (e.g., multiplicity 1 and 2) constrain the possible hyperedge structures, limiting the search space and enabling more accurate reconstruction. In contrast, the absence of edge multiplicity information leads to an explosion of candidates, including infinitely many possibilities, complicating the reconstruction process.
  • Figure 2: Example of hypergraph reconstruction on a co-authorship dataset.(a): In the ground-truth hypergraph, each hyperedge represents a set of researchers who co-authored a paper. We focus on the visualized sub-hypergraph $\mathcal{H}$, induced by Jure Leskovec and his randomly chosen ten co-authors. (b): As input, the graph representation $G$ of $\mathcal{H}$ is given, where each edge indicates the number of co-authored papers between researchers. (c):SHyRe-Count wang2024from recovers a subset of ground-truth hyperedges in $\mathcal{H}$ along with some false positives (Jaccard similarity $= 0.333$). (d): In contrast, our proposed method, MARIOH, exactly restores $\mathcal{H}$ (Jaccard and multi-Jaccard similarity $= 1.000$). Refer to Section \ref{['exp:case']} for detailed settings, and refer to the online appendix supple for more case studies on the Host-virus and Crimes datasets.
  • Figure 3: Example procedure of MARIOH. From a projected graph $\mathcal{G}$ (top), MARIOH reconstructs a hypergraph $\widehat{\mathcal{H}}$ (bottom). First, MARIOH identifies edges in $\mathcal{G}$ that are theoretically guaranteed to correspond to size-$2$ hyperedges. These edges are directly incorporated into $\widehat{\mathcal{H}}$ and removed from $\mathcal{G}$. Then, MARIOH identifies the maximal cliques in $\mathcal{G}$ and predicts the likelihood of each clique using a classifier trained on multiplicity-aware features. Based on the predicted likelihood, MARIOH employs a bidirectional search that identifies (1) cliques with high estimated likelihood (e.g., (A) $\{5,6,7\}$ and (B) $\{2,3,5,6\}$) and (2) sub-cliques with high likelihood (e.g., (C') $\{6,11\}$) but are hidden within larger low-likelihood cliques (e.g., (C) $\{6,10,11\}$), as hyperedges. However, clique (B) $\{2,3,5,6\}$ is not identified as a hyperedge since it no longer exists in the updated projected graph after removing (A) $\{5,6,7\}$. This is repeated until no edges remain in $\mathcal{G}$.
  • Figure 4: Hyeprparameter sensitivity analysis.MARIOH is robust to variations in hyperparameters: $\alpha$, $r$, and $\theta_\text{init}$, in both multiplicity-reduced setting (above) and multiplicity-preserved setting (below).
  • Figure 5: Average runtime of MARIOH and competitors. While MARIOH is slower than basic baselines, it takes less time on average than recent advanced methods (i.e., SHyRe-Motif and SHyRe-Count). For runtimes on individual datasets, see Fig. \ref{['exp:run_compare']} and the online appendix supple.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma 1: Upper Bound of Higher-Order Hyperedges
  • Lemma 2: Lower Bound on Size-2 Hyperedges
  • Lemma 3: Time Complexity of Algorithm \ref{['algo:filtering']}
  • Lemma 4: Time Complexity of Algorithm \ref{['algo:bidirection']}