Table of Contents
Fetching ...

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching

Divya Nori, Wengong Jin

TL;DR

RNAFlow tackles protein-conditioned RNA design by using a conditional flow matching framework in which an inverse folding denoiser serves as the score predictor and is trained while a pre-trained RF2NA structure predictor remains fixed to reduce computational cost. It further leverages conformational ensembles through Traj-to-Seq, enabling the design to reflect RNA dynamics. Empirically, RNAFlow improves native sequence recovery, RMSD, and lDDT compared with diffusion-based and sequence-only baselines, and demonstrates effective motif-guided GRK2 aptamer design. The approach provides a scalable, dynamics-aware pipeline for RNA structure-sequence co-design with potential impact on RNA therapeutics, while highlighting the need for more accurate docking of full protein-RNA complexes for further gains.

Abstract

The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching

TL;DR

RNAFlow tackles protein-conditioned RNA design by using a conditional flow matching framework in which an inverse folding denoiser serves as the score predictor and is trained while a pre-trained RF2NA structure predictor remains fixed to reduce computational cost. It further leverages conformational ensembles through Traj-to-Seq, enabling the design to reflect RNA dynamics. Empirically, RNAFlow improves native sequence recovery, RMSD, and lDDT compared with diffusion-based and sequence-only baselines, and demonstrates effective motif-guided GRK2 aptamer design. The approach provides a scalable, dynamics-aware pipeline for RNA structure-sequence co-design with potential impact on RNA therapeutics, while highlighting the need for more accurate docking of full protein-RNA complexes for further gains.

Abstract

The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.
Paper Structure (23 sections, 11 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 23 sections, 11 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: One forward pass in RNAFlow training, during which an inverse folding model is finetuned to be the flow matching score prediction network. The inverse folding model predicts a denoised RNA sequence from a noised complex backbone graph. The predicted sequence is folded by RF2NA for sequence and structure supervision.
  • Figure 2: One forward pass during Traj-to-Seq inference. A subset of structures from a flow matching trajectory are encoded by a multi-GNN and pooled in an order-invariant manner to predict an RNA sequence.
  • Figure 3: (A) Top: Structure and sequence design of RNA for interaction with a viral RNA-dependent RNA polymerase (PDB ID: 4K4X). Bottom: Design of RNA for interaction with HIV-1 Rev protein (PDB ID: 4PMI). (B) Ablation study of RNAFlow components. We report RMSD and sequence recovery on the sequence similarity split.
  • Figure 4: Structure and sequence RNAs designed by RNAFlow for GRK2 binding in motif-scaffolded setting. The predicted structure is Kabsch aligned onto the ground-truth for visualization. In the sequence designs, green-colored characters show nucleotides that are correctly recovered from the ground-truth. Underlined nucleotides are part of the given binding motif.
  • Figure 5: Distribution of RNA lengths in processed PDBBind dataset.