A Simple and Efficient Algorithm for Sorting Signed Permutations by Reversals
Krister M. Swenson
TL;DR
The paper tackles the problem of sorting a signed permutation into the identity using a minimum-length sequence of reversals, a foundational task in computational genomics. It reinterprets the problem through the overlap graph and a recovery framework for unsafe reversals, linking reversals to local graph complementations and using a balanced BST (splay-tree) representation to efficiently manage good pairs. The main contribution is an $O(n \log n)$ worst-case algorithm that is lightweight to implement and relies on a recoverable sequence of good reversals, supported by a specialized data structure and a backtracking recovery mechanism. This work narrows the gap toward the conjectured lower bound and enhances practical feasibility for large-scale genome rearrangement studies.
Abstract
In 1937, biologists Sturtevant and Tan posed a computational question: transform a chromosome represented by a permutation of genes, into a second permutation, using a minimum-length sequence of reversals, each inverting the order of a contiguous subset of elements. Solutions to this problem, applied to Drosophila chromosomes, were computed by hand. The first algorithmic result was a heuristic that was published in 1982. In the 1990s a more biologically relevant version of the problem, where the elements have signs that are also inverted by a reversal, finally received serious attention by the computer science community. This effort eventually resulted in the first polynomial time algorithm for Signed Sorting by Reversals. Since then, a dozen more articles have been dedicated to simplifying the theory and developing algorithms with improved running times. The current best algorithm, which runs in $O(n \log^2 n / \log\log n)$ time, fails to meet what some consider to be the likely lower bound of $O(n \log n)$. In this article, we present the first algorithm that runs in $O(n \log n)$ time in the worst case. The algorithm is fairly simple to implement, and the running time hides very low constants.
