Table of Contents
Fetching ...

On the External Validity of Average-Case Analyses of Graph Algorithms

Thomas Bläsius, Philipp Fischbeck

TL;DR

The paper tackles whether average-case analyses of graph algorithms translate to real-world performance by introducing a systematic study across generated graphs with controllable locality and heterogeneity and a large collection of sparse real networks. It demonstrates strong external validity for several core algorithms, showing that locality and heterogeneity largely determine practical performance and that model-based insights often carry over to real networks. The authors provide detailed methodological definitions for heterogeneity and locality, compare multiple network models (ER, Chung–Lu, GIRG), and evaluate six algorithms (including bidirectional BFS, diameter estimation, vertex cover domination, Louvain clustering, maximal cliques enumeration, and chromatic-number reduction). The findings suggest that understanding and controlling locality and heterogeneity yields meaningful predictions for real-world inputs, with practical implications for algorithm design and benchmarking, and advocate for the continued use of locality-focused models like GIRGs to study average-case behavior.

Abstract

The number one criticism of average-case analysis is that we do not actually know the probability distribution of real-world inputs. Thus, analyzing an algorithm on some random model has no implications for practical performance. At its core, this criticism doubts the existence of external validity, i.e., it assumes that algorithmic behavior on the somewhat simple and clean models does not translate beyond the models to practical performance real-world input. With this paper, we provide a first step towards studying the question of external validity systematically. To this end, we evaluate the performance of six graph algorithms on a collection of 2740 sparse real-world networks depending on two properties; the heterogeneity (variance in the degree distribution) and locality (tendency of edges to connect vertices that are already close). We compare this with the performance on generated networks with varying locality and heterogeneity. We find that the performance in the idealized setting of network models translates surprisingly well to real-world networks. Moreover, heterogeneity and locality appear to be the core properties impacting the performance of many graph algorithms.

On the External Validity of Average-Case Analyses of Graph Algorithms

TL;DR

The paper tackles whether average-case analyses of graph algorithms translate to real-world performance by introducing a systematic study across generated graphs with controllable locality and heterogeneity and a large collection of sparse real networks. It demonstrates strong external validity for several core algorithms, showing that locality and heterogeneity largely determine practical performance and that model-based insights often carry over to real networks. The authors provide detailed methodological definitions for heterogeneity and locality, compare multiple network models (ER, Chung–Lu, GIRG), and evaluate six algorithms (including bidirectional BFS, diameter estimation, vertex cover domination, Louvain clustering, maximal cliques enumeration, and chromatic-number reduction). The findings suggest that understanding and controlling locality and heterogeneity yields meaningful predictions for real-world inputs, with practical implications for algorithm design and benchmarking, and advocate for the continued use of locality-focused models like GIRGs to study average-case behavior.

Abstract

The number one criticism of average-case analysis is that we do not actually know the probability distribution of real-world inputs. Thus, analyzing an algorithm on some random model has no implications for practical performance. At its core, this criticism doubts the existence of external validity, i.e., it assumes that algorithmic behavior on the somewhat simple and clean models does not translate beyond the models to practical performance real-world input. With this paper, we provide a first step towards studying the question of external validity systematically. To this end, we evaluate the performance of six graph algorithms on a collection of 2740 sparse real-world networks depending on two properties; the heterogeneity (variance in the degree distribution) and locality (tendency of edges to connect vertices that are already close). We compare this with the performance on generated networks with varying locality and heterogeneity. We find that the performance in the idealized setting of network models translates surprisingly well to real-world networks. Moreover, heterogeneity and locality appear to be the core properties impacting the performance of many graph algorithms.
Paper Structure (65 sections, 2 theorems, 9 equations, 17 figures, 2 tables)

This paper contains 65 sections, 2 theorems, 9 equations, 17 figures, 2 tables.

Key Result

Lemma 1

The average distance locality ${L}_{\mathop{\mathrm{dist}}\nolimits}(G)\xspace$ is

Figures (17)

  • Figure 1: The density (kernel density estimation) of heterogeneity, degree locality, distance locality, and locality of the networks in our data set of real-world networks.
  • Figure 2: Comparison of the average local clustering coefficient to the degree locality (left), the distance locality (center), and the locality (right). Each dot represents one network from our data set of real-world networks.
  • Figure 3: Heterogeneity and locality of the generated networks from the different models. Each point is the average of five samples with the given parameter configuration.
  • Figure 4: The exponent $x$ of the average cost $c = m^x$ of the bidirectional BFS over 100.0 $st$-pairs.
  • Figure 5: The exponent $x$ of the number of BFSs $c = n^x$ of the iFUB algorithms. Different from the rest of the paper, the GIRG ground space is a square instead of a torus.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Lemma 1
  • Lemma 2