Table of Contents
Fetching ...

Polynomial-Time Solutions for Longest Common Subsequence Related Problems Between a Sequence and a Pangenome Graph

Xingfu Li, Yongping Wang

TL;DR

The paper addresses the problem of measuring similarity between a query sequence and a pangenome graph by studying the Longest Common Subsequence (LCS) and three related variants. It introduces four reductions that transform these problems into the longest-path problem on directed acyclic graphs (DAGs), establishing polynomial-time solvability and providing formal correctness via constructions H, H_gap, H_MEM, and H_MSP. The authors analyze the time complexity of these reductions, showing, for example, that building H requires $O(|Q|N)$ vertices with $N = \sum_{v\in V(G)} |\delta(v)|$ and that reachability-based edge construction incurs $O(n^3)$ time via Floyd–Warshall, yielding overall costs such as $O(n^3 + |Q|^2 N^2)$ for H, with similar bounds for the other DAGs. They also prove conditional lower bounds under SETH for sub-quadratic solutions and discuss practical limitations due to cubic-time components, advocating future work toward sub-quadratic algorithms for scalable sequence-to-pangenome analysis.

Abstract

A pangenome captures the genetic diversity across multiple individuals simultaneously, providing a more comprehensive reference for genome analysis than a single linear genome, which may introduce allele bias. A widely adopted pangenome representation is a node-labeled directed graph, wherein the paths correspond to plausible genomic sequences within a species. Consequently, evaluating sequence-to-pangenome graph similarity constitutes a fundamental task in pangenome construction and analysis. This study explores the Longest Common Subsequence (LCS) problem and three of its variants involving a sequence and a pangenome graph. We present four polynomial-time reductions that transform these LCS-related problems into the longest path problem in a directed acyclic graph (DAG). These reductions demonstrate that all four problems can be solved in polynomial time, establishing their membership in the complexity class P.

Polynomial-Time Solutions for Longest Common Subsequence Related Problems Between a Sequence and a Pangenome Graph

TL;DR

The paper addresses the problem of measuring similarity between a query sequence and a pangenome graph by studying the Longest Common Subsequence (LCS) and three related variants. It introduces four reductions that transform these problems into the longest-path problem on directed acyclic graphs (DAGs), establishing polynomial-time solvability and providing formal correctness via constructions H, H_gap, H_MEM, and H_MSP. The authors analyze the time complexity of these reductions, showing, for example, that building H requires vertices with and that reachability-based edge construction incurs time via Floyd–Warshall, yielding overall costs such as for H, with similar bounds for the other DAGs. They also prove conditional lower bounds under SETH for sub-quadratic solutions and discuss practical limitations due to cubic-time components, advocating future work toward sub-quadratic algorithms for scalable sequence-to-pangenome analysis.

Abstract

A pangenome captures the genetic diversity across multiple individuals simultaneously, providing a more comprehensive reference for genome analysis than a single linear genome, which may introduce allele bias. A widely adopted pangenome representation is a node-labeled directed graph, wherein the paths correspond to plausible genomic sequences within a species. Consequently, evaluating sequence-to-pangenome graph similarity constitutes a fundamental task in pangenome construction and analysis. This study explores the Longest Common Subsequence (LCS) problem and three of its variants involving a sequence and a pangenome graph. We present four polynomial-time reductions that transform these LCS-related problems into the longest path problem in a directed acyclic graph (DAG). These reductions demonstrate that all four problems can be solved in polynomial time, establishing their membership in the complexity class P.
Paper Structure (8 sections, 16 theorems, 2 equations, 2 algorithms)

This paper contains 8 sections, 16 theorems, 2 equations, 2 algorithms.

Key Result

Lemma 1

There is an LCS between $Q_1$ and $Q_2$ with length at least $\ell$ if and only if there is an LCS between $Q_1$ and $\hat{G}$ with length at least $\ell$.

Theorems & Definitions (32)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 22 more