Scalable and Interpretable Identification of Minimal Undesignable RNA Structure Motifs with Rotational Invariance
Tianshuo Zhou, Wei Yu Tang, Apoorv Malik, David H. Mathews, Liang Huang
TL;DR
The paper tackles the undesignability of RNA secondary structures under the Turner energy model by introducing a motif-centric framework that isolates minimal undesignable motifs. It develops rival-motif-based criteria, a loop-pair graph representation, and a scalable FastMotif algorithm to identify and classify motifs under constrained folding, achieving explainability and efficiency. Across the Eterna100 and ArchiveII datasets, the approach uncovers 355 unique minimal undesignable motifs (24 in Eterna100 and 331 in ArchiveII), revealing limits of the current energy model and providing a public web server for motif exploration. This work advances interpretability in RNA design, enabling targeted model refinement and broader applicability to loop-based folding paradigms while highlighting pathways for future algorithmic and parameterization improvements.
Abstract
RNA design aims to find a sequence that folds with highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., "motifs") that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Our source code is available at https://github.com/shanry/RNA-Undesign and our web server is available at http://linearfold.org/motifs.
