Table of Contents
Fetching ...

Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

Tianshuo Zhou, David H. Mathews, Liang Huang

TL;DR

A theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way are introduced and a linear-time dynamic programming algorithm is developed that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure.

Abstract

Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at https://github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.

Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

TL;DR

A theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way are introduced and a linear-time dynamic programming algorithm is developed that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure.

Abstract

Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at https://github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.
Paper Structure (21 sections, 3 theorems, 29 equations, 11 figures, 5 tables, 4 algorithms)

This paper contains 21 sections, 3 theorems, 29 equations, 11 figures, 5 tables, 4 algorithms.

Key Result

Theorem 1

If a structure $\boldsymbol{{y^\star}}\xspace\xspace$ can be decomposed into a set of non-overlapping motifs $\boldsymbol{M}=\{\boldsymbol{{m}}\xspace\xspace_1,\boldsymbol{{m}}\xspace\xspace_2,\ldots,\boldsymbol{{m}}\xspace\xspace_C\}$, then for any sequence $\boldsymbol{{x}}\xspace\xspace \in \math

Figures (11)

  • Figure 2: Ensemble approximation with a single rival structure.
  • Figure 3: Example of ensemble approximation with multiple rival structures.
  • Figure 4: Example of structure decomposition for the Eterna100 structure "multilooping fun", which is decomposed into 3 mtofis $\boldsymbol{{m}}\xspace\xspace_a, \boldsymbol{{m}}\xspace\xspace_b, \text{and } \boldsymbol{{m}}\xspace\xspace_c$ highlighted in different colors.
  • Figure 5: A motif $\boldsymbol{{m}}\xspace$ is split into $\boldsymbol{{m}}\xspace\xspace_a$ and $\boldsymbol{{m}}\xspace\xspace_b$ at the base pair $(i, j)$
  • Figure 6: Three different decompositions for the same structure for the same structure shown in Fig. \ref{['fig:decomposition_eterna57']}.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 1: Probability decomposition over non-overlapping motifs
  • Corollary 1
  • Theorem 2
  • Definition 1
  • Definition 2
  • Definition 3