Table of Contents
Fetching ...

RiboSphere: Learning Unified and Efficient Representations of RNA Structures

Zhou Zhang, Hanqun Cao, Cheng Tan, Fang Wu, Pheng Ann Heng, Tianfan Fu

Abstract

Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modular organization of RNA architecture: complex folds are composed from recurring structural motifs. RiboSphere uses a geometric transformer encoder to produce SE(3)-invariant (rotation/translation-invariant) features, which are discretized with finite scalar quantization (FSQ) into a finite vocabulary of latent codes. Conditioned on these discrete codes, a flow-matching decoder reconstructs atomic coordinates, enabling high-fidelity structure generation. We find that the learned code indices are enriched for specific RNA motifs, suggesting that the model captures motif-level compositional structure rather than acting as a purely compressive bottleneck. Across benchmarks, RiboSphere achieves strong performance in structure reconstruction (RMSD 1.25\,Å, TM-score 0.84), and its pretrained discrete representations transfer effectively to inverse folding and RNA--ligand binding prediction, with robust generalization in data-scarce regimes.

RiboSphere: Learning Unified and Efficient Representations of RNA Structures

Abstract

Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modular organization of RNA architecture: complex folds are composed from recurring structural motifs. RiboSphere uses a geometric transformer encoder to produce SE(3)-invariant (rotation/translation-invariant) features, which are discretized with finite scalar quantization (FSQ) into a finite vocabulary of latent codes. Conditioned on these discrete codes, a flow-matching decoder reconstructs atomic coordinates, enabling high-fidelity structure generation. We find that the learned code indices are enriched for specific RNA motifs, suggesting that the model captures motif-level compositional structure rather than acting as a purely compressive bottleneck. Across benchmarks, RiboSphere achieves strong performance in structure reconstruction (RMSD 1.25\,Å, TM-score 0.84), and its pretrained discrete representations transfer effectively to inverse folding and RNA--ligand binding prediction, with robust generalization in data-scarce regimes.
Paper Structure (45 sections, 34 equations, 5 figures, 5 tables)

This paper contains 45 sections, 34 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of RiboSphere: from continuous RNA geometry to discrete, interpretable structural units.
  • Figure 2: Overall pipeline of RiboSphere. (a) General autoencoder pipeline. RNA atomic coordinates are encoded by a geometric transformer into continuous latent representations, which are discretized via Finite Scalar Quantization (FSQ) to obtain discrete geometric tokens. A flow-matching decoder reconstructs full 3D structures from the discrete representations, enabling high-fidelity structure reconstruction and serving as the pretraining objective. (b) Multi-task downstream integration. The pretrained encoder and quantizer are frozen and reused across downstream tasks. Discrete structural tokens and continuous embeddings are transferred to task-specific architectures for inverse folding and RNA--ligand binding prediction, providing a shared and interpretable geometric backbone.
  • Figure 3: Structural consistency of high-frequency VQ token sequences.
  • Figure 4: Motif-Specific Token Distributions in the VQ Structural Space
  • Figure 5: Recovery–Diversity trade-off for the inverse folding model under different sampling temperatures.