Table of Contents
Fetching ...

Engineering Semi-streaming DFS algorithms

Kancharla Nikhilesh Bhagavan, Macharla Sri Vardhan, Madamanchi Ashok Chowdary, Shahbaz Khan

TL;DR

This work tackles the problem of computing a DFS tree in the semi-streaming model, where memory is limited and multiple passes are allowed. It builds on prior space-pass tradeoffs and introduces practical heuristics (H1–H3) to improve existing approaches for kPath and kLev under a fixed $nk$ edge-storage budget. Through extensive experiments on real and synthetic graphs (uniform and power-law), the authors show substantial reductions in the number of passes, often achieving optimal one-pass performance, and, in worst-case random graphs, at most two passes. The results highlight the value of practical heuristics in making semi-streaming DFS algorithms viable for large-scale graphs, with clear guidance on when to prefer simpler versus more complex schemes.

Abstract

Depth first search is a fundamental graph problem having a wide range of applications. For a graph $G=(V,E)$ having $n$ vertices and $m$ edges, the DFS tree can be computed in $O(m+n)$ using $O(m)$ space where $m=O(n^2)$. In the streaming environment, most graph problems are studied in the semi-streaming model where several passes (preferably one) are allowed over the input, allowing $O(nk)$ local space for some $k=o(n)$. Trivially, using $O(m)$ space, DFS can be computed in one pass, and using $O(n)$ space, it can be computed in $O(n)$ passes. Khan and Mehta [STACS19] presented several algorithms allowing trade-offs between space and passes, where $O(nk)$ space results in $O(n/k)$ passes. They also empirically analyzed their algorithm to require only a few passes in practice for even $O(n)$ space. Chang et al. [STACS20] presented an alternate proof for the same and also presented $O(\sqrt{n})$ pass algorithm requiring $O(n~poly\log n)$ space with a finer trade-off between space and passes. However, their algorithm uses complex black box algorithms, making it impractical. We perform an experimental analysis of the practical semi-streaming DFS algorithms. Our analysis ranges from real graphs to random graphs (uniform and power-law). We also present several heuristics to improve the state-of-the-art algorithms and study their impact. Our heuristics improve state of the art by $40-90\%$, achieving optimal one pass in almost $40-50\%$ cases (improved from zero). In random graphs, they improve from $30-90\%$, again requiring optimal one pass for even very small values of $k$. Overall, our heuristics improved the relatively complex state-of-the-art algorithm significantly, requiring merely two passes in the worst case for random graphs. Additionally, our heuristics made the relatively simpler algorithm practically usable even for very small space bounds, which was impractical earlier.

Engineering Semi-streaming DFS algorithms

TL;DR

This work tackles the problem of computing a DFS tree in the semi-streaming model, where memory is limited and multiple passes are allowed. It builds on prior space-pass tradeoffs and introduces practical heuristics (H1–H3) to improve existing approaches for kPath and kLev under a fixed edge-storage budget. Through extensive experiments on real and synthetic graphs (uniform and power-law), the authors show substantial reductions in the number of passes, often achieving optimal one-pass performance, and, in worst-case random graphs, at most two passes. The results highlight the value of practical heuristics in making semi-streaming DFS algorithms viable for large-scale graphs, with clear guidance on when to prefer simpler versus more complex schemes.

Abstract

Depth first search is a fundamental graph problem having a wide range of applications. For a graph having vertices and edges, the DFS tree can be computed in using space where . In the streaming environment, most graph problems are studied in the semi-streaming model where several passes (preferably one) are allowed over the input, allowing local space for some . Trivially, using space, DFS can be computed in one pass, and using space, it can be computed in passes. Khan and Mehta [STACS19] presented several algorithms allowing trade-offs between space and passes, where space results in passes. They also empirically analyzed their algorithm to require only a few passes in practice for even space. Chang et al. [STACS20] presented an alternate proof for the same and also presented pass algorithm requiring space with a finer trade-off between space and passes. However, their algorithm uses complex black box algorithms, making it impractical. We perform an experimental analysis of the practical semi-streaming DFS algorithms. Our analysis ranges from real graphs to random graphs (uniform and power-law). We also present several heuristics to improve the state-of-the-art algorithms and study their impact. Our heuristics improve state of the art by , achieving optimal one pass in almost cases (improved from zero). In random graphs, they improve from , again requiring optimal one pass for even very small values of . Overall, our heuristics improved the relatively complex state-of-the-art algorithm significantly, requiring merely two passes in the worst case for random graphs. Additionally, our heuristics made the relatively simpler algorithm practically usable even for very small space bounds, which was impractical earlier.
Paper Structure (21 sections, 13 figures, 3 tables)

This paper contains 21 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Average improvement in required passes for kPath and kLev for real datasets.
  • Figure 2: Performance of the faster algorithms as the number of vertices is varied with (a) $m=O(n\log n)$, and (b) $m=O(n\sqrt{n})$ densities, using $10n$ edges ($k=10$).
  • Figure 3: Performance of the algorithms as the number of edges are varied up to $O(n^2)$ for $n=1000$ vertices and different values of allowed edges ($k=2,5,10$) for (a) kPath, and (b) kLev.
  • Figure 4: Performance of algorithms as the number of stored edges are varied up to $k=n$ for $n=1000$ and different values of $m=O(n\log n), O(n\sqrt{n}),O(n^2)$ for (a) kPath, and (b) kLev.
  • Figure 5: Performance of the faster algorithms as the number of vertices is varied with (a) $m=O(n\log n)$, and (b) $m=O(n\sqrt{n})$ densities, using $10n$ edges ($k=10$).
  • ...and 8 more figures