Table of Contents
Fetching ...

Hyper-distance Oracles in Hypergraphs

Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi

TL;DR

This work introduces HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph and proves the usefulness of the s-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.

Abstract

We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding constructing the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that, as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond, while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely, hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyper-edges in the context of protein-to-protein interactions.

Hyper-distance Oracles in Hypergraphs

TL;DR

This work introduces HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph and proves the usefulness of the s-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.

Abstract

We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding constructing the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that, as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond, while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely, hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyper-edges in the context of protein-to-protein interactions.
Paper Structure (17 sections, 9 equations, 13 figures, 9 tables, 3 algorithms)

This paper contains 17 sections, 9 equations, 13 figures, 9 tables, 3 algorithms.

Figures (13)

  • Figure 1: A hypergraph (left) and its line graph (right, see §\ref{['sec:linegraph']}) augmented with hexagonal vertices and dashed edges to keep track of nodes-to-hyperedge membership.
  • Figure 2: Number of $s$-connected components $|\textsf{CC}|$ (normalized to the number of hyperedges $|E|$) as a function of $s$. Lower values indicate higher connectivity.
  • Figure 3: A hypergraph (left) and two $2$-distance oracles (right) with $1$ landmark selected using the degree ($e_4$) and the betweenness ($e_7$) landmark selection strategy. The oracles include the $2$-distances from the landmark to all the hyperedges reachable via $2$-paths.
  • Figure 4: Time required to find the $s$-connected components for $s \in [1,10]$ using LG, CCS-IS, and CCS-SW.
  • Figure 5: Oracle building time (top) and MAE (bottom) using the Sampling landmark assignment strategy for different combinations of importance factors $(\alpha, \beta)$. The charts report mean and standard deviation among the values for the degree, farthest, and bestcover landmark selection strategies.
  • ...and 8 more figures

Theorems & Definitions (8)

  • Example 1
  • Definition 1: $s$-adjacent, $s$-walk, $s$-path aksoy2020hypernetwork
  • Definition 2: $s$-connected components aksoy2020hypernetwork
  • Definition 3: vertex-to-vertex $s$-distance
  • Definition 4: vertex-to-hyperedge $s$-distance
  • Definition 5: line graph berge1984hypergraphs
  • Example 2
  • Example 3