Fast and Flexible Flow Decompositions in General Graphs via Dominators
Francisco Sena, Alexandru I. Tomescu
TL;DR
This work extends dominator-tree–based safe-sequence techniques from DAGs to general graphs with cycles to accelerate flow-decomposition MILPs. By computing maximal safe sequences via the condensation of dominator trees and the graph’s SCC structure, the authors fix many MILP variables to 1 or 0, reducing model size and removing the need for nonlinear products. They prove linear-time enumeration of maximal safe sequences and provide a practical framework implemented in the Flowpaths library, demonstrating dramatic speedups on bacterial-genome graphs across three decomposition models. The method enables fast, exact or near-exact flow decompositions in cyclic graphs, with significant potential impact on multi-assembly tasks such as metagenomics and strain-resolved viral assembly.
Abstract
Multi-assembly methods rely at their core on a flow decomposition problem, namely, decomposing a weighted graph into weighted paths or walks. However, most results over the past decade have focused on decompositions over directed acyclic graphs (DAGs). This limitation has lead to either purely heuristic methods, or in applications transforming a graph with cycles into a DAG via preprocessing heuristics. In this paper we show that flow decomposition problems can be solved in practice also on general graphs with cycles, via a framework that yields fast and flexible Mixed Integer Linear Programming (MILP) formulations. Our key technique relies on the graph-theoretic notion of dominator tree, which we use to find all safe sequences of edges, that are guaranteed to appear in some walk of any flow decomposition solution. We generalize previous results from DAGs to cyclic graphs, by showing that maximal safe sequences correspond to extensions of common leaves of two dominator trees, and that we can find all of them in time linear in their size. Using these, we can accelerate MILPs for any flow decomposition into walks in general graphs, by setting to (at least) 1 suitable variables encoding solution walks, and by setting to 0 other walks variables non-reachable to and from safe sequences. This reduces model size and eliminates costly linearizations of MILP variable products. We experiment with three decomposition models (Minimum Flow Decomposition, Least Absolute Errors and Minimum Path Error), on four bacterial datasets. Our pre-processing enables up to thousand-fold speedups and solves even under 30 seconds many instances otherwise timing out. We thus hope that our dominator-based MILP simplification framework, and the accompanying software library can become building blocks in multi-assembly applications.
