Table of Contents
Fetching ...

Scalable and Interpretable Identification of Minimal Undesignable RNA Structure Motifs with Rotational Invariance

Tianshuo Zhou, Wei Yu Tang, Apoorv Malik, David H. Mathews, Liang Huang

TL;DR

The paper tackles the undesignability of RNA secondary structures under the Turner energy model by introducing a motif-centric framework that isolates minimal undesignable motifs. It develops rival-motif-based criteria, a loop-pair graph representation, and a scalable FastMotif algorithm to identify and classify motifs under constrained folding, achieving explainability and efficiency. Across the Eterna100 and ArchiveII datasets, the approach uncovers 355 unique minimal undesignable motifs (24 in Eterna100 and 331 in ArchiveII), revealing limits of the current energy model and providing a public web server for motif exploration. This work advances interpretability in RNA design, enabling targeted model refinement and broader applicability to loop-based folding paradigms while highlighting pathways for future algorithmic and parameterization improvements.

Abstract

RNA design aims to find a sequence that folds with highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., "motifs") that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Our source code is available at https://github.com/shanry/RNA-Undesign and our web server is available at http://linearfold.org/motifs.

Scalable and Interpretable Identification of Minimal Undesignable RNA Structure Motifs with Rotational Invariance

TL;DR

The paper tackles the undesignability of RNA secondary structures under the Turner energy model by introducing a motif-centric framework that isolates minimal undesignable motifs. It develops rival-motif-based criteria, a loop-pair graph representation, and a scalable FastMotif algorithm to identify and classify motifs under constrained folding, achieving explainability and efficiency. Across the Eterna100 and ArchiveII datasets, the approach uncovers 355 unique minimal undesignable motifs (24 in Eterna100 and 331 in ArchiveII), revealing limits of the current energy model and providing a public web server for motif exploration. This work advances interpretability in RNA design, enabling targeted model refinement and broader applicability to loop-based folding paradigms while highlighting pathways for future algorithmic and parameterization improvements.

Abstract

RNA design aims to find a sequence that folds with highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., "motifs") that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Our source code is available at https://github.com/shanry/RNA-Undesign and our web server is available at http://linearfold.org/motifs.
Paper Structure (25 sections, 7 theorems, 15 equations, 11 figures, 4 tables, 6 algorithms)

This paper contains 25 sections, 7 theorems, 15 equations, 11 figures, 4 tables, 6 algorithms.

Key Result

theorem thmcountertheorem

If a motif $\boldsymbol{{m^\star}}\xspace\xspace$ is undesignable, then any motif $\boldsymbol{{m}}\xspace\xspace$ such that $\boldsymbol{{m^\star}}\xspace\xspace \subseteq \boldsymbol{{m}}\xspace\xspace$ is undesignable.

Figures (11)

  • Figure 1: Illustration of two minimal undesignable motifs from Eternal00 puzzle #52 (motif loops in green, boundary pairs in orange, internal pairs in blue).
  • Figure 2: Example of secondary structure and loops.
  • Figure 3: Motifs with various cardinalities (numbers of loops): $\mathit{card}\xspace(\boldsymbol{{m}}\xspace\xspace_1)\!=\!1$, $\mathit{card}\xspace(\boldsymbol{{m}}\xspace\xspace_2)\!=\!2$, $\mathit{card}\xspace(\boldsymbol{{m}}\xspace\xspace_3)\!=\!3$. Loops are in green, internal pairs ($\mathit{ipairs}\xspace$) in orange and boundary pairs ($\mathit{bpairs}\xspace$) in blue.
  • Figure 3: Undesignable ( undes.) structures and minimal undesignable ( m. u.) motifs in Eterna100 puzzles & native structures from ArchiveII.
  • Figure 4: Example of target motif and rival motif(s). The target motif \ref{['fig:tm1']} is from the structure in Fig. \ref{['fig:e52motifs']}, the target motif \ref{['fig:tm2']} is from Eterna100 puzzle "Mat - Elements & Sections" as plotted in Table Fig. \ref{['table:puzzles']}.
  • ...and 6 more figures

Theorems & Definitions (21)

  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • theorem thmcountertheorem
  • proof
  • corollary thmcountercorollary
  • proof
  • ...and 11 more