A Comprehensive Benchmark for RNA 3D Structure-Function Modeling
Luis Wyss, Vincent Mallet, Wissam Karroucha, Karsten Borgwardt, Carlos Oliver
TL;DR
This work introduces a modular, reproducible benchmarking suite for RNA 3D structure–function modeling built atop the rnaglib framework, addressing a critical gap in standardized evaluation for RNA structure encoders. It defines seven tasks (GO tagging, inverse folding, dynamic modification detection, complex proximity, ligand-adjacent sites, pocket classification, and affinity-based screening) with curated datasets, redundancy-aware splits, and consistent evaluation, enabling fair cross-model comparisons. Baseline results using simple relational graph networks reveal that multi-level representations, especially 2.5D graphs, substantially outperform sequence-only approaches while 3D encoders can offer additional gains, highlighting the value of incorporating 3D geometry and base-pair context. The framework demonstrates strong reproducibility, scalability, and extensibility, and points to future directions like dynamic conformational modeling and transferable RNA embeddings, which collectively accelerate progress in RNA structure–function understanding and design.
Abstract
The relationship between RNA structure and function has recently attracted interest within the deep learning community, a trend expected to intensify as nucleic acid structure models advance. Despite this momentum, the lack of standardized, accessible benchmarks for applying deep learning to RNA 3D structures hinders progress. To this end, we introduce a collection of seven benchmarking datasets specifically designed to support RNA structure-function prediction. Built on top of the established Python package rnaglib, our library streamlines data distribution and encoding, provides tools for dataset splitting and evaluation, and offers a comprehensive, user-friendly environment for model comparison. The modular and reproducible design of our datasets encourages community contributions and enables rapid customization. To demonstrate the utility of our benchmarks, we report baseline results for all tasks using a relational graph neural network.
