Polynomial-Time Solutions for Longest Common Subsequence Related Problems Between a Sequence and a Pangenome Graph
Xingfu Li, Yongping Wang
TL;DR
The paper addresses the problem of measuring similarity between a query sequence and a pangenome graph by studying the Longest Common Subsequence (LCS) and three related variants. It introduces four reductions that transform these problems into the longest-path problem on directed acyclic graphs (DAGs), establishing polynomial-time solvability and providing formal correctness via constructions H, H_gap, H_MEM, and H_MSP. The authors analyze the time complexity of these reductions, showing, for example, that building H requires $O(|Q|N)$ vertices with $N = \sum_{v\in V(G)} |\delta(v)|$ and that reachability-based edge construction incurs $O(n^3)$ time via Floyd–Warshall, yielding overall costs such as $O(n^3 + |Q|^2 N^2)$ for H, with similar bounds for the other DAGs. They also prove conditional lower bounds under SETH for sub-quadratic solutions and discuss practical limitations due to cubic-time components, advocating future work toward sub-quadratic algorithms for scalable sequence-to-pangenome analysis.
Abstract
A pangenome captures the genetic diversity across multiple individuals simultaneously, providing a more comprehensive reference for genome analysis than a single linear genome, which may introduce allele bias. A widely adopted pangenome representation is a node-labeled directed graph, wherein the paths correspond to plausible genomic sequences within a species. Consequently, evaluating sequence-to-pangenome graph similarity constitutes a fundamental task in pangenome construction and analysis. This study explores the Longest Common Subsequence (LCS) problem and three of its variants involving a sequence and a pangenome graph. We present four polynomial-time reductions that transform these LCS-related problems into the longest path problem in a directed acyclic graph (DAG). These reductions demonstrate that all four problems can be solved in polynomial time, establishing their membership in the complexity class P.
