Table of Contents
Fetching ...

Incremental Topological Ordering and Cycle Detection with Predictions

Samuel McCauley, Benjamin Moseley, Aidin Niaparast, Shikha Singh

TL;DR

This work tackles maintaining an incremental topological ordering and cycle detection in directed graphs via learning-augmented methods. It introduces a coarse-grained prediction model and two data structures: an ideal, decomposition-based Ideal Learned Ordering with provable worst-case guarantees adapted to prediction quality, and a practical Learned DFS Ordering (LDFS) that achieves $O(m\eta)$ total time, with per-edge cost $O(\eta)$. Theoretical results show smooth interpolation between ideal and worst-case performance, and experiments on real temporal DAGs demonstrate substantial speedups using modest training data and robustness to prediction errors. Overall, the paper bridges theory and practice in dynamic graphs by leveraging predictions to improve both asymptotic guarantees and empirical performance, highlighting potential for broader learning-augmented data-structure design.

Abstract

This paper leverages the framework of algorithms-with-predictions to design data structures for two fundamental dynamic graph problems: incremental topological ordering and cycle detection. In these problems, the input is a directed graph on $n$ nodes, and the $m$ edges arrive one by one. The data structure must maintain a topological ordering of the vertices at all times and detect if the newly inserted edge creates a cycle. The theoretically best worst-case algorithms for these problems have high update cost (polynomial in $n$ and $m$). In practice, greedy heuristics (that recompute the solution from scratch each time) perform well but can have high update cost in the worst case. In this paper, we bridge this gap by leveraging predictions to design a learned new data structure for the problems. Our data structure guarantees consistency, robustness, and smoothness with respect to predictions -- that is, it has the best possible running time under perfect predictions, never performs worse than the best-known worst-case methods, and its running time degrades smoothly with the prediction error. Moreover, we demonstrate empirically that predictions, learned from a very small training dataset, are sufficient to provide significant speed-ups on real datasets.

Incremental Topological Ordering and Cycle Detection with Predictions

TL;DR

This work tackles maintaining an incremental topological ordering and cycle detection in directed graphs via learning-augmented methods. It introduces a coarse-grained prediction model and two data structures: an ideal, decomposition-based Ideal Learned Ordering with provable worst-case guarantees adapted to prediction quality, and a practical Learned DFS Ordering (LDFS) that achieves total time, with per-edge cost . Theoretical results show smooth interpolation between ideal and worst-case performance, and experiments on real temporal DAGs demonstrate substantial speedups using modest training data and robustness to prediction errors. Overall, the paper bridges theory and practice in dynamic graphs by leveraging predictions to improve both asymptotic guarantees and empirical performance, highlighting potential for broader learning-augmented data-structure design.

Abstract

This paper leverages the framework of algorithms-with-predictions to design data structures for two fundamental dynamic graph problems: incremental topological ordering and cycle detection. In these problems, the input is a directed graph on nodes, and the edges arrive one by one. The data structure must maintain a topological ordering of the vertices at all times and detect if the newly inserted edge creates a cycle. The theoretically best worst-case algorithms for these problems have high update cost (polynomial in and ). In practice, greedy heuristics (that recompute the solution from scratch each time) perform well but can have high update cost in the worst case. In this paper, we bridge this gap by leveraging predictions to design a learned new data structure for the problems. Our data structure guarantees consistency, robustness, and smoothness with respect to predictions -- that is, it has the best possible running time under perfect predictions, never performs worse than the best-known worst-case methods, and its running time degrades smoothly with the prediction error. Moreover, we demonstrate empirically that predictions, learned from a very small training dataset, are sufficient to provide significant speed-ups on real datasets.
Paper Structure (40 sections, 11 theorems, 13 equations, 5 figures, 2 tables)

This paper contains 40 sections, 11 theorems, 13 equations, 5 figures, 2 tables.

Key Result

Lemma 3.4

If the insertion of the last edge creates a cycle in $G_t$, the simple learned algorithm correctly detects and reports it. Furthermore, for any edge $e = (u, v)$ in the graph $G_t$ at time $t$, $L(u) < L(v)$.

Figures (5)

  • Figure 1: Total cost (number of nodes and edges processed) of LDFS compared to the two baselines for email-Eu-core dataset, in logarithmic scale. In Figure \ref{['fig:email-Eu-core scale training set']}, the x-axis is the percentage of the input sequence used as training data for LDFS. Figure \ref{['fig:email-Eu-core robustness']} shows the effect of adding noise to predictions on the cost of LDFS. The first 5% of the input is used as the training data and the last 95% as the test data. For different values of $C$, a normal noise with mean 0 and standard deviation (SD) of $C\cdot$SD(predictions) is independently added to each prediction. This noise is regenerated 10 times. The x-axis is SD(noise)/SD(predictions). The blue line is the mean and the cloud around it is the SD of these experiments.
  • Figure 2: Performance comparison for different edge densities on synthetic DAGs (in logarithmic scale). The number of nodes is $n=1000$, and we increase $p$ in the x-axis (in logarithmic scale). We use the first 5% of the input as the training data for LDFS (our algorithm), and the last 95% is used as the test data for all the algorithms. The blue lines correspond to the results for LDFS, with different amounts of perturbation added to the predictions. The perturbation is a normal noise with mean 0 and standard deviation $C.\text{SD(predictions)}$ that is independently added to each prediction, where SD(predictions) is the standard deviation of the initial predictions. We include the results for $C=0,1,2$. The blue lines are the average of 5 different runs, each time regenerating the noise. Figures \ref{['fig:DAGcost']} and \ref{['fig:DAGtime']} illustrate the cost (number of nodes and edges processed) and the runtime of these experiments, respectively.
  • Figure 3: email-Eu-core
  • Figure 4: CollegeMsg
  • Figure 5: Math Overflow

Theorems & Definitions (14)

  • Lemma 3.4
  • Lemma 3.5
  • proof
  • Lemma 3.6
  • proof
  • Lemma 3.7
  • Theorem 3.8
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • ...and 4 more