Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective
Cheng Tan, Zhangyang Gao, Hanqun Cao, Xingran Chen, Ge Wang, Lirong Wu, Jun Xia, Jiangbin Zheng, Stan Z. Li
TL;DR
The paper reframes RNA secondary structure prediction as a probabilistic K-Rook matching problem, defining a finite space of valid structures and optimizing to select the best match for a given sequence. RFold introduces a bi-dimensional optimization that decomposes the likelihood into row-wise and column-wise components, guaranteeing output validity and enabling efficient inference. Across standard benchmarks, generalization tests, long-range predictions, and pseudoknots, RFold achieves competitive or superior accuracy while outperforming many baselines in speed. The approach demonstrates strong cross-dataset robustness and scalable performance, highlighting the practical potential of a constraint-satisfying, learning-guided structure predictor. However, its strict constraint formulation may trade recall for precision, suggesting future work to balance completeness with validity.
Abstract
The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code and Colab demo are available in (http://github.com/A4Bio/RFold).
