Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

Tianshuo Zhou; David H. Mathews; Liang Huang

Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

Tianshuo Zhou, David H. Mathews, Liang Huang

TL;DR

A theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way are introduced and a linear-time dynamic programming algorithm is developed that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure.

Abstract

Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at https://github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.

Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

TL;DR

Abstract

Paper Structure (21 sections, 3 theorems, 29 equations, 11 figures, 5 tables, 4 algorithms)

This paper contains 21 sections, 3 theorems, 29 equations, 11 figures, 5 tables, 4 algorithms.

Ensemble Approximation via Rival Search
Ensemble Approximation with a Single Rival Structure (Motif)
Ensemble Approximation with Multiple Rival Structures (Motifs)
Complexity Analysis
Linear-time Dynamic Programming over Exponentially Many Decompositions
Structure and Probability Decomposition
Optimal Decomposition Search by Linear-Time Dynamic Programming
Experiments
Settings
Implementation
Datasets
Overall Results
Detailed Results
Ablation Study
Designability Dissection
...and 6 more sections

Key Result

Theorem 1

If a structure $\boldsymbol{{y^\star}}\xspace\xspace$ can be decomposed into a set of non-overlapping motifs $\boldsymbol{M}=\{\boldsymbol{{m}}\xspace\xspace_1,\boldsymbol{{m}}\xspace\xspace_2,\ldots,\boldsymbol{{m}}\xspace\xspace_C\}$, then for any sequence $\boldsymbol{{x}}\xspace\xspace \in \math

Figures (11)

Figure 2: Ensemble approximation with a single rival structure.
Figure 3: Example of ensemble approximation with multiple rival structures.
Figure 4: Example of structure decomposition for the Eterna100 structure "multilooping fun", which is decomposed into 3 mtofis $\boldsymbol{{m}}\xspace\xspace_a, \boldsymbol{{m}}\xspace\xspace_b, \text{and } \boldsymbol{{m}}\xspace\xspace_c$ highlighted in different colors.
Figure 5: A motif $\boldsymbol{{m}}\xspace$ is split into $\boldsymbol{{m}}\xspace\xspace_a$ and $\boldsymbol{{m}}\xspace\xspace_b$ at the base pair $(i, j)$
Figure 6: Three different decompositions for the same structure for the same structure shown in Fig. \ref{['fig:decomposition_eterna57']}.
...and 6 more figures

Theorems & Definitions (6)

Theorem 1: Probability decomposition over non-overlapping motifs
Corollary 1
Theorem 2
Definition 1
Definition 2
Definition 3

Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

TL;DR

Abstract

Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (6)