Engineering Semi-streaming DFS algorithms

Kancharla Nikhilesh Bhagavan; Macharla Sri Vardhan; Madamanchi Ashok Chowdary; Shahbaz Khan

Engineering Semi-streaming DFS algorithms

Kancharla Nikhilesh Bhagavan, Macharla Sri Vardhan, Madamanchi Ashok Chowdary, Shahbaz Khan

TL;DR

This work tackles the problem of computing a DFS tree in the semi-streaming model, where memory is limited and multiple passes are allowed. It builds on prior space-pass tradeoffs and introduces practical heuristics (H1–H3) to improve existing approaches for kPath and kLev under a fixed $nk$ edge-storage budget. Through extensive experiments on real and synthetic graphs (uniform and power-law), the authors show substantial reductions in the number of passes, often achieving optimal one-pass performance, and, in worst-case random graphs, at most two passes. The results highlight the value of practical heuristics in making semi-streaming DFS algorithms viable for large-scale graphs, with clear guidance on when to prefer simpler versus more complex schemes.

Abstract

Depth first search is a fundamental graph problem having a wide range of applications. For a graph $G=(V,E)$ having $n$ vertices and $m$ edges, the DFS tree can be computed in $O(m+n)$ using $O(m)$ space where $m=O(n^2)$. In the streaming environment, most graph problems are studied in the semi-streaming model where several passes (preferably one) are allowed over the input, allowing $O(nk)$ local space for some $k=o(n)$. Trivially, using $O(m)$ space, DFS can be computed in one pass, and using $O(n)$ space, it can be computed in $O(n)$ passes. Khan and Mehta [STACS19] presented several algorithms allowing trade-offs between space and passes, where $O(nk)$ space results in $O(n/k)$ passes. They also empirically analyzed their algorithm to require only a few passes in practice for even $O(n)$ space. Chang et al. [STACS20] presented an alternate proof for the same and also presented $O(\sqrt{n})$ pass algorithm requiring $O(n~poly\log n)$ space with a finer trade-off between space and passes. However, their algorithm uses complex black box algorithms, making it impractical. We perform an experimental analysis of the practical semi-streaming DFS algorithms. Our analysis ranges from real graphs to random graphs (uniform and power-law). We also present several heuristics to improve the state-of-the-art algorithms and study their impact. Our heuristics improve state of the art by $40-90\%$, achieving optimal one pass in almost $40-50\%$ cases (improved from zero). In random graphs, they improve from $30-90\%$, again requiring optimal one pass for even very small values of $k$. Overall, our heuristics improved the relatively complex state-of-the-art algorithm significantly, requiring merely two passes in the worst case for random graphs. Additionally, our heuristics made the relatively simpler algorithm practically usable even for very small space bounds, which was impractical earlier.

Engineering Semi-streaming DFS algorithms

TL;DR

edge-storage budget. Through extensive experiments on real and synthetic graphs (uniform and power-law), the authors show substantial reductions in the number of passes, often achieving optimal one-pass performance, and, in worst-case random graphs, at most two passes. The results highlight the value of practical heuristics in making semi-streaming DFS algorithms viable for large-scale graphs, with clear guidance on when to prefer simpler versus more complex schemes.

Abstract

Depth first search is a fundamental graph problem having a wide range of applications. For a graph

having

vertices and

edges, the DFS tree can be computed in

using

space where

. In the streaming environment, most graph problems are studied in the semi-streaming model where several passes (preferably one) are allowed over the input, allowing

local space for some

. Trivially, using

space, DFS can be computed in one pass, and using

space, it can be computed in

passes. Khan and Mehta [STACS19] presented several algorithms allowing trade-offs between space and passes, where

space results in

passes. They also empirically analyzed their algorithm to require only a few passes in practice for even

space. Chang et al. [STACS20] presented an alternate proof for the same and also presented

pass algorithm requiring

space with a finer trade-off between space and passes. However, their algorithm uses complex black box algorithms, making it impractical. We perform an experimental analysis of the practical semi-streaming DFS algorithms. Our analysis ranges from real graphs to random graphs (uniform and power-law). We also present several heuristics to improve the state-of-the-art algorithms and study their impact. Our heuristics improve state of the art by

, achieving optimal one pass in almost

cases (improved from zero). In random graphs, they improve from

, again requiring optimal one pass for even very small values of

. Overall, our heuristics improved the relatively complex state-of-the-art algorithm significantly, requiring merely two passes in the worst case for random graphs. Additionally, our heuristics made the relatively simpler algorithm practically usable even for very small space bounds, which was impractical earlier.

Paper Structure (21 sections, 13 figures, 3 tables)

This paper contains 21 sections, 13 figures, 3 tables.

Introduction
Related Work
Our Results
Preliminaries
Previous work
Khan and Mehta KhanM19
Chang et al. ChangFHT20
Proposed Heuristics
Experimental Setup
Algorithms and Implementation details
Evaluation Metrics
Environment Details
Datasets
Results
Real Graphs
...and 6 more sections

Figures (13)

Figure 1: Average improvement in required passes for kPath and kLev for real datasets.
Figure 2: Performance of the faster algorithms as the number of vertices is varied with (a) $m=O(n\log n)$, and (b) $m=O(n\sqrt{n})$ densities, using $10n$ edges ($k=10$).
Figure 3: Performance of the algorithms as the number of edges are varied up to $O(n^2)$ for $n=1000$ vertices and different values of allowed edges ($k=2,5,10$) for (a) kPath, and (b) kLev.
Figure 4: Performance of algorithms as the number of stored edges are varied up to $k=n$ for $n=1000$ and different values of $m=O(n\log n), O(n\sqrt{n}),O(n^2)$ for (a) kPath, and (b) kLev.
Figure 5: Performance of the faster algorithms as the number of vertices is varied with (a) $m=O(n\log n)$, and (b) $m=O(n\sqrt{n})$ densities, using $10n$ edges ($k=10$).
...and 8 more figures

Engineering Semi-streaming DFS algorithms

TL;DR

Abstract

Engineering Semi-streaming DFS algorithms

Authors

TL;DR

Abstract

Table of Contents

Figures (13)