Table of Contents
Fetching ...

Improved Approximation Algorithms and Hardness Results for Shortest Common Superstring with Reverse Complements

Ryosuke Yamano, Tetsuo Shibuya

Abstract

The Shortest Common Superstring (SCS) problem is a fundamental task in sequence analysis. In genome assembly, however, the double-stranded nature of DNA implies that each fragment may occur either in its original orientation or as its reverse complement. This motivates the Shortest Common Superstring with Reverse Complements (SCS-RC) problem, which asks for a shortest string that contains, for each input string, either the string itself or its reverse complement as a substring. The previously best-known approximation ratio for SCS-RC was $\frac{23}{8}$. In this paper, we present a new approximation algorithm achieving an improved ratio of $\frac{8}{3}$. Our approach computes an optimal constrained cycle cover by reducing the problem, via a novel gadget construction, to a maximum-weight perfect matching in a general graph. We also investigate the computational hardness of SCS-RC. While the decision version is known to be NP-complete, no explicit inapproximability results were previously established. We show that the hardness of SCS carries over to SCS-RC through a polynomial-time reduction, implying that it is NP-hard to approximate SCS-RC within a factor better than $\frac{333}{332}$. Notably, this hardness result holds even for the DNA alphabet.

Improved Approximation Algorithms and Hardness Results for Shortest Common Superstring with Reverse Complements

Abstract

The Shortest Common Superstring (SCS) problem is a fundamental task in sequence analysis. In genome assembly, however, the double-stranded nature of DNA implies that each fragment may occur either in its original orientation or as its reverse complement. This motivates the Shortest Common Superstring with Reverse Complements (SCS-RC) problem, which asks for a shortest string that contains, for each input string, either the string itself or its reverse complement as a substring. The previously best-known approximation ratio for SCS-RC was . In this paper, we present a new approximation algorithm achieving an improved ratio of . Our approach computes an optimal constrained cycle cover by reducing the problem, via a novel gadget construction, to a maximum-weight perfect matching in a general graph. We also investigate the computational hardness of SCS-RC. While the decision version is known to be NP-complete, no explicit inapproximability results were previously established. We show that the hardness of SCS carries over to SCS-RC through a polynomial-time reduction, implying that it is NP-hard to approximate SCS-RC within a factor better than . Notably, this hardness result holds even for the DNA alphabet.

Paper Structure

This paper contains 6 sections, 8 theorems, 8 equations.

Key Result

Theorem 1

The SCS-RC problem admits an $\frac{8}{3}$-approximation algorithm.

Theorems & Definitions (9)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Corollary 4
  • Definition 5: Shortest Common Superstring with Reverse Complements (SCS-RC)
  • Lemma 6: Blum.et.al
  • Lemma 7: Blum.et.alJIANG1992195
  • Lemma 8: Overlap Rotation Lemma Breslaure.1997.OverlapRotationLemma
  • Lemma 9: Breslaure.1997.OverlapRotationLemmaYamanoShibuya2026SCSRC