Table of Contents
Fetching ...

Multi-View Structural Graph Summaries

Jonatan Frank, Andor Diera, David Richerby, Ansgar Scherp

TL;DR

The work tackles merging multi-view structural graph summaries by introducing a model-agnostic algorithm that preserves schema-consistent EQCs while integrating payloads. The authors prove a worst-case bound of $O(|E|^2)$ for merging two summaries, yet show that, under practical assumptions and with hash-based structures, the time can be reduced to $O(|E|)$, with empirical evidence favoring small-edge mergers. Through three large-scale RDF datasets spanning web graphs, source-code graphs, and news articles, they demonstrate that merge time correlates strongly with edge counts and that the smallest-first merging strategy consistently yields the best performance. The results indicate the approach is scalable and generalizable to other summary models, offering a practical path to faster graph-based tasks in multi-view contexts.

Abstract

A structural graph summary is a small graph representation that preserves structural information necessary for a given task. The summary is used instead of the original graph to complete the task faster. We introduce multi-view structural graph summaries and propose an algorithm for merging two summaries. We conduct a theoretical analysis of our algorithm. We run experiments on three datasets, contributing two new ones. The datasets are of different domains (web graph, source code, and news) and sizes; the interpretation of multi-view depends on the domain and are pay-level domains on the web, control vs.\@ data flow of the code, and news broadcasters. We experiment with three graph summary models: attribute collection, class collection, and their combination. We observe that merging two structural summaries has an upper bound of quadratic complexity; but under reasonable assumptions, it has linear-time worst-case complexity. The running time of merging has a strong linear correlation with the number of edges in the two summaries. Therefore, the experiments support the assumption that the upper bound of quadratic complexity is not tight and that linear complexity is possible. Furthermore, our experiments show that always merging the two smallest summaries by the number of edges is the most efficient strategy for merging multiple structural summaries.

Multi-View Structural Graph Summaries

TL;DR

The work tackles merging multi-view structural graph summaries by introducing a model-agnostic algorithm that preserves schema-consistent EQCs while integrating payloads. The authors prove a worst-case bound of for merging two summaries, yet show that, under practical assumptions and with hash-based structures, the time can be reduced to , with empirical evidence favoring small-edge mergers. Through three large-scale RDF datasets spanning web graphs, source-code graphs, and news articles, they demonstrate that merge time correlates strongly with edge counts and that the smallest-first merging strategy consistently yields the best performance. The results indicate the approach is scalable and generalizable to other summary models, offering a practical path to faster graph-based tasks in multi-view contexts.

Abstract

A structural graph summary is a small graph representation that preserves structural information necessary for a given task. The summary is used instead of the original graph to complete the task faster. We introduce multi-view structural graph summaries and propose an algorithm for merging two summaries. We conduct a theoretical analysis of our algorithm. We run experiments on three datasets, contributing two new ones. The datasets are of different domains (web graph, source code, and news) and sizes; the interpretation of multi-view depends on the domain and are pay-level domains on the web, control vs.\@ data flow of the code, and news broadcasters. We experiment with three graph summary models: attribute collection, class collection, and their combination. We observe that merging two structural summaries has an upper bound of quadratic complexity; but under reasonable assumptions, it has linear-time worst-case complexity. The running time of merging has a strong linear correlation with the number of edges in the two summaries. Therefore, the experiments support the assumption that the upper bound of quadratic complexity is not tight and that linear complexity is possible. Furthermore, our experiments show that always merging the two smallest summaries by the number of edges is the most efficient strategy for merging multiple structural summaries.
Paper Structure (26 sections, 3 figures, 4 tables, 2 algorithms)

This paper contains 26 sections, 3 figures, 4 tables, 2 algorithms.

Figures (3)

  • Figure 1: The problem of merging two graph summaries (views). The vertex color represents the EQC under the summary model AC, i. e., the set of outgoing edges. The vertices $t$, $v$, and $x$ appear in both views.
  • Figure 2: Merging two summaries. The color of the vertices indicates the EQC. Payload vertices are omitted for clarity. Here, the payload is directly attached to the EQC.
  • Figure 3: Regression on the various summaries and the datasets. The row indicates the dataset, and the column indicates the summary model. For each regression, the caption contains the function and the coefficient of determination $R^2$ of $|E|$, $|E|\log (|E|)$, and $|E|^2$. The color shows to which line in the plot it belongs. All p-values are lower than 0.05.