Table of Contents
Fetching ...

Fast and Flexible Flow Decompositions in General Graphs via Dominators

Francisco Sena, Alexandru I. Tomescu

TL;DR

This work extends dominator-tree–based safe-sequence techniques from DAGs to general graphs with cycles to accelerate flow-decomposition MILPs. By computing maximal safe sequences via the condensation of dominator trees and the graph’s SCC structure, the authors fix many MILP variables to 1 or 0, reducing model size and removing the need for nonlinear products. They prove linear-time enumeration of maximal safe sequences and provide a practical framework implemented in the Flowpaths library, demonstrating dramatic speedups on bacterial-genome graphs across three decomposition models. The method enables fast, exact or near-exact flow decompositions in cyclic graphs, with significant potential impact on multi-assembly tasks such as metagenomics and strain-resolved viral assembly.

Abstract

Multi-assembly methods rely at their core on a flow decomposition problem, namely, decomposing a weighted graph into weighted paths or walks. However, most results over the past decade have focused on decompositions over directed acyclic graphs (DAGs). This limitation has lead to either purely heuristic methods, or in applications transforming a graph with cycles into a DAG via preprocessing heuristics. In this paper we show that flow decomposition problems can be solved in practice also on general graphs with cycles, via a framework that yields fast and flexible Mixed Integer Linear Programming (MILP) formulations. Our key technique relies on the graph-theoretic notion of dominator tree, which we use to find all safe sequences of edges, that are guaranteed to appear in some walk of any flow decomposition solution. We generalize previous results from DAGs to cyclic graphs, by showing that maximal safe sequences correspond to extensions of common leaves of two dominator trees, and that we can find all of them in time linear in their size. Using these, we can accelerate MILPs for any flow decomposition into walks in general graphs, by setting to (at least) 1 suitable variables encoding solution walks, and by setting to 0 other walks variables non-reachable to and from safe sequences. This reduces model size and eliminates costly linearizations of MILP variable products. We experiment with three decomposition models (Minimum Flow Decomposition, Least Absolute Errors and Minimum Path Error), on four bacterial datasets. Our pre-processing enables up to thousand-fold speedups and solves even under 30 seconds many instances otherwise timing out. We thus hope that our dominator-based MILP simplification framework, and the accompanying software library can become building blocks in multi-assembly applications.

Fast and Flexible Flow Decompositions in General Graphs via Dominators

TL;DR

This work extends dominator-tree–based safe-sequence techniques from DAGs to general graphs with cycles to accelerate flow-decomposition MILPs. By computing maximal safe sequences via the condensation of dominator trees and the graph’s SCC structure, the authors fix many MILP variables to 1 or 0, reducing model size and removing the need for nonlinear products. They prove linear-time enumeration of maximal safe sequences and provide a practical framework implemented in the Flowpaths library, demonstrating dramatic speedups on bacterial-genome graphs across three decomposition models. The method enables fast, exact or near-exact flow decompositions in cyclic graphs, with significant potential impact on multi-assembly tasks such as metagenomics and strain-resolved viral assembly.

Abstract

Multi-assembly methods rely at their core on a flow decomposition problem, namely, decomposing a weighted graph into weighted paths or walks. However, most results over the past decade have focused on decompositions over directed acyclic graphs (DAGs). This limitation has lead to either purely heuristic methods, or in applications transforming a graph with cycles into a DAG via preprocessing heuristics. In this paper we show that flow decomposition problems can be solved in practice also on general graphs with cycles, via a framework that yields fast and flexible Mixed Integer Linear Programming (MILP) formulations. Our key technique relies on the graph-theoretic notion of dominator tree, which we use to find all safe sequences of edges, that are guaranteed to appear in some walk of any flow decomposition solution. We generalize previous results from DAGs to cyclic graphs, by showing that maximal safe sequences correspond to extensions of common leaves of two dominator trees, and that we can find all of them in time linear in their size. Using these, we can accelerate MILPs for any flow decomposition into walks in general graphs, by setting to (at least) 1 suitable variables encoding solution walks, and by setting to 0 other walks variables non-reachable to and from safe sequences. This reduces model size and eliminates costly linearizations of MILP variable products. We experiment with three decomposition models (Minimum Flow Decomposition, Least Absolute Errors and Minimum Path Error), on four bacterial datasets. Our pre-processing enables up to thousand-fold speedups and solves even under 30 seconds many instances otherwise timing out. We thus hope that our dominator-based MILP simplification framework, and the accompanying software library can become building blocks in multi-assembly applications.

Paper Structure

This paper contains 27 sections, 11 theorems, 13 equations, 8 figures, 3 tables.

Key Result

Theorem 1

Let $G=(V,E)$ be an $s$-$t$ graph. A sequence $X$ of vertices is safe for walk covers if and only if there exists a vertex $v \in V$ such that $X$ is a subsequence of $\mathsf{extension}(v)$.

Figures (8)

  • Figure 1: Example of a graph $G$, its $s$- and $t$-dominator trees, four $C$-safe sequences in $G$, for $C = V$. For illustration purposes, in a sequence we draw univocal edges between nodes as solid lines, and other connections as dashed: for example, the yellow safe sequence is $(v_0,v_{20},v_{21},v_{20},v_{23},v_{24})$. Because of \ref{['thm:cores-vertices']}, maximal safe sequences are obtained by concatenating the paths in the $s$- and $t$-dominator trees: for example, the yellow sequence is obtain by concatenating the path $v_0v_{20}v_{21}$ in the $s$-dominator tree with the path $v_{21}v_{20}v_{23}v_{24}$ in the $t$-dominator tree.
  • Figure 2: Example of a graph $G$, its blue-dominator trees with respect to $C = \{a,e,g,u,w\}$. Lower opacity vertices are not part of the blue-dominator trees. Underlying the blue-dominator trees are the dominator trees of $G$ for $C=V$. Vertex $e$ is a blue-child of vertex $a$ in the $s$-dominator tree (and vertex $a$ is its closest blue-ancestor). The path $uw$ is maximal $C$-univocal and its collapsed into vertex $u$, which stores the sequence $(v,w)$. The path $ea$ is not $C$-univocal because of the position of vertex $g$ relative to $\{a,e\}$ in the $s$-dominator tree. The maximal $C$-safe sequences are: $\mathsf{extension}(u)=\mathsf{extension}(w)=(s,u,v,w,t)$, $\mathsf{extension}(e)=(s,a,d,e,a,d,f,t)$, and $\mathsf{extension}(g)=(s,a,g,d,f,t)$.
  • Figure 3: Fixing walk variables using incompatible safe sequences of edges. For the $i$-th safe sequence (assumed here in the order red, blue, violet, yellow), and for every edge $uv$ in it, we set $x_{uv,i} = 1$ if $uv$ connects different SCCs (it cannot be traversed more than once by the $i$-th walk, since $G^{scc}$ is acyclic). If $uv$ lies within an SCC, we set $x_{uv,i}$ at least to the number of times $uv$ appears in the sequence (it must be traversed at least this many times, but possibly more). When fixing variables to 0, note that for the red sequence (1st), edge $(j,h)$ does not reach $s$, is not reached by $t$, and while $j$ is reachable from $a$, $h$ does not reach $c$—thus it cannot be used by the first solution walk.
  • Figure 4: The four blue edges form a maximum-weight edge antichain of $G$, of weight 20. In $G'$, every non-trivial SCC $u^i$ of $G$ (i.e. with at least one edge) is replaced by a gadget made up of a single edge $u^i_{in}u^i_{out}$ with weight equaling the maximum weight of an edge in that SCC. Every trivial SCC of $G$ (i.e. with no edges) is kept as a single vertex. All edges between SCCs are replaced by a path of length 2, with a private vertex (in gray) for every edge; the first edge of this length-2 path has the same weight as the original edge, and the second edge of this length-2 path has weight 0 (not shown). The maximum-weight edge antichain of $G'$, corresponding to the one shown for $G$, is also shown in blue.
  • Figure :
  • ...and 3 more figures

Theorems & Definitions (20)

  • Definition 1: Flow decomposition variants
  • Theorem 1
  • Lemma 1
  • Definition 2: Univocal path
  • Lemma 2
  • Theorem 2: Characterization of maximal safe sequences
  • Theorem 3: Optimal enumeration and representation of safe sequences
  • Theorem 4: Characterization of safe sequences for $C$-walk covers
  • Theorem 5: Characterization of maximal safe sequences for $C$-walk covers
  • Theorem 6: Maximal safe sequence enumeration for $C$-walk covers
  • ...and 10 more