Multi-View Structural Graph Summaries

Jonatan Frank; Andor Diera; David Richerby; Ansgar Scherp

Multi-View Structural Graph Summaries

Jonatan Frank, Andor Diera, David Richerby, Ansgar Scherp

TL;DR

The work tackles merging multi-view structural graph summaries by introducing a model-agnostic algorithm that preserves schema-consistent EQCs while integrating payloads. The authors prove a worst-case bound of $O(|E|^2)$ for merging two summaries, yet show that, under practical assumptions and with hash-based structures, the time can be reduced to $O(|E|)$, with empirical evidence favoring small-edge mergers. Through three large-scale RDF datasets spanning web graphs, source-code graphs, and news articles, they demonstrate that merge time correlates strongly with edge counts and that the smallest-first merging strategy consistently yields the best performance. The results indicate the approach is scalable and generalizable to other summary models, offering a practical path to faster graph-based tasks in multi-view contexts.

Abstract

A structural graph summary is a small graph representation that preserves structural information necessary for a given task. The summary is used instead of the original graph to complete the task faster. We introduce multi-view structural graph summaries and propose an algorithm for merging two summaries. We conduct a theoretical analysis of our algorithm. We run experiments on three datasets, contributing two new ones. The datasets are of different domains (web graph, source code, and news) and sizes; the interpretation of multi-view depends on the domain and are pay-level domains on the web, control vs.\@ data flow of the code, and news broadcasters. We experiment with three graph summary models: attribute collection, class collection, and their combination. We observe that merging two structural summaries has an upper bound of quadratic complexity; but under reasonable assumptions, it has linear-time worst-case complexity. The running time of merging has a strong linear correlation with the number of edges in the two summaries. Therefore, the experiments support the assumption that the upper bound of quadratic complexity is not tight and that linear complexity is possible. Furthermore, our experiments show that always merging the two smallest summaries by the number of edges is the most efficient strategy for merging multiple structural summaries.

Multi-View Structural Graph Summaries

TL;DR

for merging two summaries, yet show that, under practical assumptions and with hash-based structures, the time can be reduced to

, with empirical evidence favoring small-edge mergers. Through three large-scale RDF datasets spanning web graphs, source-code graphs, and news articles, they demonstrate that merge time correlates strongly with edge counts and that the smallest-first merging strategy consistently yields the best performance. The results indicate the approach is scalable and generalizable to other summary models, offering a practical path to faster graph-based tasks in multi-view contexts.

Abstract

Paper Structure (26 sections, 3 figures, 4 tables, 2 algorithms)

This paper contains 26 sections, 3 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Problem Formalization
Definition of Graph, Summary Graph, and Multi-views
Graph Summarization using Equivalence Relations
Merging Graph Summaries
Algorithm for Pairwise Merging
Algorithm
Complexity Analysis
Merging $n$ Summaries
Experimental Apparatus
Datasets
Billion Triple Challenge 2019 Dataset
CRAN Social Science Three Views Dataset
International News Coverage 2023 Dataset
...and 11 more sections

Figures (3)

Figure 1: The problem of merging two graph summaries (views). The vertex color represents the EQC under the summary model AC, i. e., the set of outgoing edges. The vertices $t$, $v$, and $x$ appear in both views.
Figure 2: Merging two summaries. The color of the vertices indicates the EQC. Payload vertices are omitted for clarity. Here, the payload is directly attached to the EQC.
Figure 3: Regression on the various summaries and the datasets. The row indicates the dataset, and the column indicates the summary model. For each regression, the caption contains the function and the coefficient of determination $R^2$ of $|E|$, $|E|\log (|E|)$, and $|E|^2$. The color shows to which line in the plot it belongs. All p-values are lower than 0.05.

Multi-View Structural Graph Summaries

TL;DR

Abstract

Multi-View Structural Graph Summaries

Authors

TL;DR

Abstract

Table of Contents

Figures (3)