Table of Contents
Fetching ...

Scalable Sample-to-Population Estimation of Hyperbolic Space Models for Hypergraphs

Cornelius Fritz, Yubai Yuan, Michael Schweinberger

TL;DR

A statistical framework is developed that enables scalable estimation, simulation, and model assessment of hypergraph models and provides non-asymptotic and asymptotic theoretical guarantees for learning hyperbolic space models based on samples from a population hypergraph.

Abstract

Hypergraphs are useful mathematical representations of overlapping and nested subsets of interacting units, including groups of genes or brain regions, economic cartels, political or military coalitions, and groups of products that are purchased together. Despite the vast range of applications, the statistical analysis of hypergraphs is challenging: There are many hyperedges of small and large sizes, and hyperedges can overlap or be nested. Existing approaches to hypergraphs are either not scalable or achieve scalability at the expense of model realism. We develop a statistical framework that enables scalable estimation, simulation, and model assessment of hypergraph models, which is supported by non-asymptotic and asymptotic theoretical guarantees. First, we introduce a novel model of hypergraphs capturing core-periphery structure in addition to proximity, by embedding units in an unobserved hyperbolic space. Second, we achieve scalability by developing manifold optimization algorithms for learning hyperbolic space models based on samples from a population hypergraph. Third, we provide non-asymptotic and asymptotic theoretical guarantees for learning hyperbolic space models based on samples from a population hypergraph. We use the proposed statistical framework to detect core-periphery structure along with proximity among U.S.\ politicians based on historical media reports.

Scalable Sample-to-Population Estimation of Hyperbolic Space Models for Hypergraphs

TL;DR

A statistical framework is developed that enables scalable estimation, simulation, and model assessment of hypergraph models and provides non-asymptotic and asymptotic theoretical guarantees for learning hyperbolic space models based on samples from a population hypergraph.

Abstract

Hypergraphs are useful mathematical representations of overlapping and nested subsets of interacting units, including groups of genes or brain regions, economic cartels, political or military coalitions, and groups of products that are purchased together. Despite the vast range of applications, the statistical analysis of hypergraphs is challenging: There are many hyperedges of small and large sizes, and hyperedges can overlap or be nested. Existing approaches to hypergraphs are either not scalable or achieve scalability at the expense of model realism. We develop a statistical framework that enables scalable estimation, simulation, and model assessment of hypergraph models, which is supported by non-asymptotic and asymptotic theoretical guarantees. First, we introduce a novel model of hypergraphs capturing core-periphery structure in addition to proximity, by embedding units in an unobserved hyperbolic space. Second, we achieve scalability by developing manifold optimization algorithms for learning hyperbolic space models based on samples from a population hypergraph. Third, we provide non-asymptotic and asymptotic theoretical guarantees for learning hyperbolic space models based on samples from a population hypergraph. We use the proposed statistical framework to detect core-periphery structure along with proximity among U.S.\ politicians based on historical media reports.

Paper Structure

This paper contains 34 sections, 6 theorems, 126 equations, 16 figures.

Key Result

Proposition 1

Identifiability The Gram matrix $\bm{D} \coloneqq \bm{\Theta}\, \bm{J}\, \bm{\Theta}^\top$ and the sparsity parameters $\alpha_2, \ldots, \alpha_K$ are identifiable provided that $N>r+2$.

Figures (16)

  • Figure 1: A hypergraph with five U.S. politicians, which is a subgraph of a hypergraph with 678 U.S. politicians described in Section \ref{['sec:application']}. There are ten hyperedges of size two, five hyperedges of size three, and one hyperedge of size four, represented by colored contours.
  • Figure 2: Hyperbolic space: (a) Poincaré disk, including an embedded tree with root $A$ close to the center of the Poincaré disk and leaves $H$, $I$, $J$, $K$, $L$, $M$, $N$, $O$ close to the boundary of the Poincaré disk. (b) Equivalent representations of hyperbolic space: Poincaré disk and Lorentz model. The segment $A'_1$--$A'_2$ is the projection of the segment $A_1$--$A_2$ in the Lorentz model onto the Poincaré disk.
  • Figure 3: Simulation results: error of estimating sparsity parameter vector $\bm{\alpha} \coloneqq (\alpha_2, \ldots, \alpha_K)^\top$ and Gram matrix $\bm{D}$ as a function of the number of controls $n$ (the number of unrealized hyperedges sampled for each realized hyperedge) and the number of units $N$.
  • Figure 4: Newswire data: Estimated positions of politicians on the Poincaré disk, with party affiliation indicated by color; politicians who switched parties are labeled "both." The origin of the Poincaré disk is represented by $\textcolor{#F5D57D}{\times}$. "Watergate Republicans" refers to Watergate era Republicans Richard Nixon, Henry Kissinger, Melvin Laird, Barry Goldwater, and Nelson Rockefeller, whose positions are almost indistinguishable.
  • Figure 5: Newswire data: distances of politicians to the center of (a) hyperbolic space or (b) Euclidean space plotted against eigenvector centrality scores. The blue lines in (a) and (b) represent the smoothed averages of the distance to the center by eigenvector centrality score. The $R^2$ value is calculated from the accuracy of using the blue line for predicting the distance to the center based on the eigenvector centrality. (c) Total variation distance between the size-$k$ degree distributions of observed and 100 simulated hypergraphs.
  • ...and 11 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Proposition 1
  • Theorem 1
  • Theorem 2