Table of Contents
Fetching ...

An efficient algorithm to compute the minimum free energy of interacting nucleic acid strands

Ahmed Shalaby, Damien Woods

TL;DR

The paper addresses predicting the Minimum Free Energy for connected unpseudoknotted multi-strand nucleic acid structures under rotational symmetry, an open problem for MFE even as partition-function methods were known for a constant number of strands. It introduces a two-pronged approach: (i) extend the symmetry-naive DP of Dirks et al. to compute a baseline MFE, and (ii) apply a backtracking procedure that exploits a polynomial bound on the number of symmetric configurations via pizza cuts and a central loop to identify the true MFE while accounting for the $k_B T\log R$ symmetry penalty. The main result is a first polynomial-time MFE algorithm for O(1) strands with symmetry, running in $O(N^4(c-1)!)$ time and $O(N^4)$ space (with a $O(N^4\log N(c-1)!)$ time, $O(N^3)$ space variant), matching the partition-function algorithm in asymptotics up to constants and providing a path toward efficient multi-stranded MFE computation despite underlying NP-hardness for growing strand counts. Key technical contributions include a linear bound on the number of symmetric backbone cuts, a rigorous “pizza slice” decomposition to manage rotational symmetry, and a backtracking framework that constructs an asymmetric true-MFE structure within a bounded energy window. These ideas yield a symmetry-aware, practical MFE solver for small-to-moderate numbers of interacting strands, with potential extensions to larger systems and connections to partition-function analyses.

Abstract

The information-encoding molecules RNA and DNA form a combinatorially large set of secondary structures through nucleic acid base pairing. Thermodynamic prediction algorithms predict favoured, or minimum free energy (MFE), secondary structures, and can assign an equilibrium probability to any structure via the partition function: a Boltzman-weighted sum over the set of secondary structures. MFE is NP-hard in the presence pseudoknots, base pairings that violate a restricted planarity condition. However, unpseudoknotted structures are amenable to dynamic programming: for a single DNA/RNA strand there are polynomial time algorithms for MFE and partition function. For multiple strands, the problem is more complicated due to entropic penalties. Dirks et al [SICOMP Review; 2007] showed that for O(1) strands, with N bases, there is a polynomial time in N partition function algorithm, however their technique did not generalise to MFE which they left open. We give the first polynomial time (O(N^4)) algorithm for unpseudoknotted multiple (O(1)) strand MFE, answering the open problem from Dirks et al. The challenge lies in considering rotational symmetry of secondary structures, a feature not immediately amenable to dynamic programming algorithms. Our proof has two main technical contributions: First, a polynomial upper bound on the number of symmetric secondary structures to be considered when computing rotational symmetry penalties. Second, that bound is leveraged by a backtracking algorithm to find the MFE in an exponential space of contenders. Our MFE algorithm has the same asymptotic run time as Dirks et al's partition function algorithm, suggesting efficient handling of rotational symmetry, although higher space complexity. It also seems reasonably tight in the number of strands since Codon, Hajiaghayi & Thachuk [DNA27, 2021] have shown that unpseudoknotted MFE is NP-hard for O(N) strands.

An efficient algorithm to compute the minimum free energy of interacting nucleic acid strands

TL;DR

The paper addresses predicting the Minimum Free Energy for connected unpseudoknotted multi-strand nucleic acid structures under rotational symmetry, an open problem for MFE even as partition-function methods were known for a constant number of strands. It introduces a two-pronged approach: (i) extend the symmetry-naive DP of Dirks et al. to compute a baseline MFE, and (ii) apply a backtracking procedure that exploits a polynomial bound on the number of symmetric configurations via pizza cuts and a central loop to identify the true MFE while accounting for the symmetry penalty. The main result is a first polynomial-time MFE algorithm for O(1) strands with symmetry, running in time and space (with a time, space variant), matching the partition-function algorithm in asymptotics up to constants and providing a path toward efficient multi-stranded MFE computation despite underlying NP-hardness for growing strand counts. Key technical contributions include a linear bound on the number of symmetric backbone cuts, a rigorous “pizza slice” decomposition to manage rotational symmetry, and a backtracking framework that constructs an asymmetric true-MFE structure within a bounded energy window. These ideas yield a symmetry-aware, practical MFE solver for small-to-moderate numbers of interacting strands, with potential extensions to larger systems and connections to partition-function analyses.

Abstract

The information-encoding molecules RNA and DNA form a combinatorially large set of secondary structures through nucleic acid base pairing. Thermodynamic prediction algorithms predict favoured, or minimum free energy (MFE), secondary structures, and can assign an equilibrium probability to any structure via the partition function: a Boltzman-weighted sum over the set of secondary structures. MFE is NP-hard in the presence pseudoknots, base pairings that violate a restricted planarity condition. However, unpseudoknotted structures are amenable to dynamic programming: for a single DNA/RNA strand there are polynomial time algorithms for MFE and partition function. For multiple strands, the problem is more complicated due to entropic penalties. Dirks et al [SICOMP Review; 2007] showed that for O(1) strands, with N bases, there is a polynomial time in N partition function algorithm, however their technique did not generalise to MFE which they left open. We give the first polynomial time (O(N^4)) algorithm for unpseudoknotted multiple (O(1)) strand MFE, answering the open problem from Dirks et al. The challenge lies in considering rotational symmetry of secondary structures, a feature not immediately amenable to dynamic programming algorithms. Our proof has two main technical contributions: First, a polynomial upper bound on the number of symmetric secondary structures to be considered when computing rotational symmetry penalties. Second, that bound is leveraged by a backtracking algorithm to find the MFE in an exponential space of contenders. Our MFE algorithm has the same asymptotic run time as Dirks et al's partition function algorithm, suggesting efficient handling of rotational symmetry, although higher space complexity. It also seems reasonably tight in the number of strands since Codon, Hajiaghayi & Thachuk [DNA27, 2021] have shown that unpseudoknotted MFE is NP-hard for O(N) strands.
Paper Structure (26 sections, 15 theorems, 13 equations, 4 figures, 1 table, 5 algorithms)

This paper contains 26 sections, 15 theorems, 13 equations, 4 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

There is an $\mathcal{O}(N^4(c-1)!)$ time and $\mathcal{O}(N^4)$ space algorithm for the Minimum Free Energy unpseudoknotted secondary structure prediction problem, including rotational symmetry, for a set of $c = \mathcal{O}(1)$ DNA or RNA strands of total length $N$ bases.

Figures (4)

  • Figure 1: A DNA (or RNA) secondary structure $S$ with $c=4$ strands and two of its $(c-1)!=6$ polymer graphs. (a) One of the many possible secondary structures for four DNA strands $W,X,Y,Z$. Short black lines represent DNA bases (a few are shown $\ldots \mathrm{C}\xspace, \mathrm{G}\xspace, \mathrm{C}\xspace, \mathrm{A}\xspace \ldots$), and long lines represent base pairs (drawing not to scale). Loops are colour-coded as follows: stack=purple, multiloop=yellow, hairpin=red, bulge=light blue, internal=dark blue, external=grey. Black arrow: the small gap between two strands is called a nick. (b) Polymer graph for the strand ordering $\pi' = WZXY$, denoted $\mathrm{Poly}(S,\pi')$, showing base-pair crossings. (c) By reordering to $\pi = WXYZ$ we get another polymer graph $\mathrm{Poly}(S,\pi)$ for $S$, without crossings, hence $S$ is unpseudoknotted.
  • Figure 2: Three secondary structures with their associated polymer graphs. In each case, there is a single complex with four identical (indistinguishable) strands of strands of type $X$, but with different symmetry degree $R$. (a) Symmetry degree $R$ = 4 (rotation by $90^{\circ}$ gives the same secondary structure). (b) Symmetry degree $R$ = 2 (rotation by $180^{\circ}$ gives the same secondary structure). (c) Symmetry degree $R$ = 1 (asymmetric secondary structure).
  • Figure 3: Slicing and swapping strategy for constructing new asymmetric structure by combining two symmetric structures with the same symmetric backbone cut. (a) 4-fold symmetric secondary structure $S_i$, with admissible $4$-symmetric backbone cut $\mathcal{C}_R^b$. Black arrows: indicate the four covalent bonds forming $\mathcal{C}_R^b$ generated by the covalent bond $b$. (b) 4-fold symmetric secondary structure $S_j$, sharing the same cut $\mathcal{C}_R^b$ as $S_i$. Black arrows: indicate the four covalent bonds forming $\mathcal{C}_R^b$. (c) Asymmetric secondary structure $S_k$ that is constructed by replacing the grey shaded 'slice' from $S_i$ by its corresponding slice from $S_j$, using the proof of \ref{['lem:sand']}.
  • Figure 4: snMFE dynamic program recursion diagrams (left) and recursion equations (right). A solid straight line indicates a base pair and a dashed line demarcates a region without implying that the connected bases are paired. Shaded regions correspond to loop free energies that are explicitly incorporated at the current level of recursion. See dirks2003partitionfornace2020unified for full details.

Theorems & Definitions (49)

  • Theorem 1
  • Definition 2: Secondary structure $S$
  • Definition 3: Polymer graph
  • Definition 4: Unpseudoknotted secondary structure
  • Remark 5
  • Remark 6: $S$, or $\mathrm{Poly}(S,\pi)\xspace$
  • Definition 7: Symmetry degree of a permutation
  • Remark 8: Notation: $X_m^n$
  • Definition 9: $R$-fold rotational symmetric structure
  • Remark 10
  • ...and 39 more