On the External Validity of Average-Case Analyses of Graph Algorithms
Thomas Bläsius, Philipp Fischbeck
TL;DR
The paper tackles whether average-case analyses of graph algorithms translate to real-world performance by introducing a systematic study across generated graphs with controllable locality and heterogeneity and a large collection of sparse real networks. It demonstrates strong external validity for several core algorithms, showing that locality and heterogeneity largely determine practical performance and that model-based insights often carry over to real networks. The authors provide detailed methodological definitions for heterogeneity and locality, compare multiple network models (ER, Chung–Lu, GIRG), and evaluate six algorithms (including bidirectional BFS, diameter estimation, vertex cover domination, Louvain clustering, maximal cliques enumeration, and chromatic-number reduction). The findings suggest that understanding and controlling locality and heterogeneity yields meaningful predictions for real-world inputs, with practical implications for algorithm design and benchmarking, and advocate for the continued use of locality-focused models like GIRGs to study average-case behavior.
Abstract
The number one criticism of average-case analysis is that we do not actually know the probability distribution of real-world inputs. Thus, analyzing an algorithm on some random model has no implications for practical performance. At its core, this criticism doubts the existence of external validity, i.e., it assumes that algorithmic behavior on the somewhat simple and clean models does not translate beyond the models to practical performance real-world input. With this paper, we provide a first step towards studying the question of external validity systematically. To this end, we evaluate the performance of six graph algorithms on a collection of 2740 sparse real-world networks depending on two properties; the heterogeneity (variance in the degree distribution) and locality (tendency of edges to connect vertices that are already close). We compare this with the performance on generated networks with varying locality and heterogeneity. We find that the performance in the idealized setting of network models translates surprisingly well to real-world networks. Moreover, heterogeneity and locality appear to be the core properties impacting the performance of many graph algorithms.
