Table of Contents
Fetching ...

Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

Cheng Tan, Zhangyang Gao, Hanqun Cao, Xingran Chen, Ge Wang, Lirong Wu, Jun Xia, Jiangbin Zheng, Stan Z. Li

TL;DR

The paper reframes RNA secondary structure prediction as a probabilistic K-Rook matching problem, defining a finite space of valid structures and optimizing to select the best match for a given sequence. RFold introduces a bi-dimensional optimization that decomposes the likelihood into row-wise and column-wise components, guaranteeing output validity and enabling efficient inference. Across standard benchmarks, generalization tests, long-range predictions, and pseudoknots, RFold achieves competitive or superior accuracy while outperforming many baselines in speed. The approach demonstrates strong cross-dataset robustness and scalable performance, highlighting the practical potential of a constraint-satisfying, learning-guided structure predictor. However, its strict constraint formulation may trade recall for precision, suggesting future work to balance completeness with validity.

Abstract

The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code and Colab demo are available in (http://github.com/A4Bio/RFold).

Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

TL;DR

The paper reframes RNA secondary structure prediction as a probabilistic K-Rook matching problem, defining a finite space of valid structures and optimizing to select the best match for a given sequence. RFold introduces a bi-dimensional optimization that decomposes the likelihood into row-wise and column-wise components, guaranteeing output validity and enabling efficient inference. Across standard benchmarks, generalization tests, long-range predictions, and pseudoknots, RFold achieves competitive or superior accuracy while outperforming many baselines in speed. The approach demonstrates strong cross-dataset robustness and scalable performance, highlighting the practical potential of a constraint-satisfying, learning-guided structure predictor. However, its strict constraint formulation may trade recall for precision, suggesting future work to balance completeness with validity.

Abstract

The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code and Colab demo are available in (http://github.com/A4Bio/RFold).
Paper Structure (36 sections, 19 equations, 7 figures, 16 tables)

This paper contains 36 sections, 19 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: The graph and matrix representation of an RNA secondary structure example.
  • Figure 2: Examples of nested and non-nested secondary structures.
  • Figure 3: The analogy between the symmetric K-Rook arrangement and the RNA secondary structure prediction.
  • Figure 4: The visualization of $\arg\max\mathcal{R}(\boldsymbol{\widehat{H}}) \odot \arg\max\mathcal{C}(\boldsymbol{\widehat{H}})$.
  • Figure 5: The overview model architecture of RFold.
  • ...and 2 more figures