Table of Contents
Fetching ...

3D-based RNA function prediction tools in rnaglib

Carlos Oliver, Vincent Mallet, Jérôme Waldispühl

TL;DR

RNA 3D structural data are expanding, enabling data-driven discovery of structure–function relationships, but standardized representations and datasets remain challenging. The chapter presents rnaglib, a Python toolkit that encodes RNA 3D structures as expressive graphs using Leontis-Westhof base-pair geometry, provides dataset construction utilities, and supports self-supervised and supervised learning workflows. It introduces RNADataset and multiple Representations (graph, point cloud, voxel) with a training loop to predict functional attributes such as binding residues, enabling end-to-end ML pipelines. The work lowers barriers to geometric deep learning on RNA and supports design and discovery by linking 3D structure to function in a scalable, extensible framework.

Abstract

Understanding the connection between complex structural features of RNA and biological function is a fundamental challenge in evolutionary studies and in RNA design. However, building datasets of RNA 3D structures and making appropriate modeling choices remains time-consuming and lacks standardization. In this chapter, we describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.

3D-based RNA function prediction tools in rnaglib

TL;DR

RNA 3D structural data are expanding, enabling data-driven discovery of structure–function relationships, but standardized representations and datasets remain challenging. The chapter presents rnaglib, a Python toolkit that encodes RNA 3D structures as expressive graphs using Leontis-Westhof base-pair geometry, provides dataset construction utilities, and supports self-supervised and supervised learning workflows. It introduces RNADataset and multiple Representations (graph, point cloud, voxel) with a training loop to predict functional attributes such as binding residues, enabling end-to-end ML pipelines. The work lowers barriers to geometric deep learning on RNA and supports design and discovery by linking 3D structure to function in a scalable, extensible framework.

Abstract

Understanding the connection between complex structural features of RNA and biological function is a fundamental challenge in evolutionary studies and in RNA design. However, building datasets of RNA 3D structures and making appropriate modeling choices remains time-consuming and lacks standardization. In this chapter, we describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.
Paper Structure (16 sections, 2 figures)

This paper contains 16 sections, 2 figures.

Figures (2)

  • Figure 1: RNA 3D representation learning paradigm. We use the Leontis-Westhof base pair geometry classification to build expressive graphs of RNA 3D models. The graphs are then embedded into a learned space with an encoder network $\mathrm{ENC}_\theta$ to emit embedding space $\mathbf{Z}$ and property space $\mathbf{P}$ with the decoder $\mathrm{DEC}_\psi$.
  • Figure 2: Sample graph drawing of PDBID: 1NLF