Table of Contents
Fetching ...

RNAGenScape: Property-guided Optimization and Interpolation of mRNA Sequences with Manifold Langevin Dynamics

Danqi Liao, Chen Liu, Xingzhi Sun, Dié Tang, Haochen Wang, Scott Youlten, Srikar Krishna Gopinath, Haejeong Lee, Ethan C. Strayer, Antonio J. Giraldez, Smita Krishnaswamy

TL;DR

This work tackles the challenge of designing and optimizing mRNA sequences under data scarcity and complex sequence-function relationships by introducing RNAGenScape, a property-guided manifold Langevin dynamics framework. It combines an organized autoencoder that structures the latent space by target properties, a learned manifold projector to keep updates biologically plausible, and SUGAR-based augmentation to fill undersampled regions, enabling efficient, trajectory-like optimization and interpolation on the mRNA manifold. Empirically, RNAGenScape achieves superior property optimization and data-aligned trajectories across three real datasets, with fast inference and the ability to decode intermediate steps for interpretation. The approach advances controllable mRNA design by constraining exploration to a learned data manifold, offering a scalable paradigm for latent-space exploration in biological sequence design.

Abstract

mRNA design and optimization are important in synthetic biology and therapeutic development, but remain understudied in machine learning. Systematic optimization of mRNAs is hindered by the scarce and imbalanced data as well as complex sequence-function relationships. We present RNAGenScape, a property-guided manifold Langevin dynamics framework that iteratively updates mRNA sequences within a learned latent manifold. RNAGenScape combines an organized autoencoder, which structures the latent space by target properties for efficient and biologically plausible exploration, with a manifold projector that contracts each step of update back to the manifold. RNAGenScape supports property-guided optimization and smooth interpolation between sequences, while remaining robust under scarce and undersampled data, and ensuring that intermediate products are close to the viable mRNA manifold. Across three real mRNA datasets, RNAGenScape improves the target properties with high success rates and efficiency, outperforming various generative or optimization methods developed for proteins or non-biological data. By providing continuous, data-aligned trajectories that reveal how edits influence function, RNAGenScape establishes a scalable paradigm for controllable mRNA design and latent space exploration in mRNA sequence modeling.

RNAGenScape: Property-guided Optimization and Interpolation of mRNA Sequences with Manifold Langevin Dynamics

TL;DR

This work tackles the challenge of designing and optimizing mRNA sequences under data scarcity and complex sequence-function relationships by introducing RNAGenScape, a property-guided manifold Langevin dynamics framework. It combines an organized autoencoder that structures the latent space by target properties, a learned manifold projector to keep updates biologically plausible, and SUGAR-based augmentation to fill undersampled regions, enabling efficient, trajectory-like optimization and interpolation on the mRNA manifold. Empirically, RNAGenScape achieves superior property optimization and data-aligned trajectories across three real datasets, with fast inference and the ability to decode intermediate steps for interpretation. The approach advances controllable mRNA design by constraining exploration to a learned data manifold, offering a scalable paradigm for latent-space exploration in biological sequence design.

Abstract

mRNA design and optimization are important in synthetic biology and therapeutic development, but remain understudied in machine learning. Systematic optimization of mRNAs is hindered by the scarce and imbalanced data as well as complex sequence-function relationships. We present RNAGenScape, a property-guided manifold Langevin dynamics framework that iteratively updates mRNA sequences within a learned latent manifold. RNAGenScape combines an organized autoencoder, which structures the latent space by target properties for efficient and biologically plausible exploration, with a manifold projector that contracts each step of update back to the manifold. RNAGenScape supports property-guided optimization and smooth interpolation between sequences, while remaining robust under scarce and undersampled data, and ensuring that intermediate products are close to the viable mRNA manifold. Across three real mRNA datasets, RNAGenScape improves the target properties with high success rates and efficiency, outperforming various generative or optimization methods developed for proteins or non-biological data. By providing continuous, data-aligned trajectories that reveal how edits influence function, RNAGenScape establishes a scalable paradigm for controllable mRNA design and latent space exploration in mRNA sequence modeling.

Paper Structure

This paper contains 44 sections, 7 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Schematic of RNAGenScape. (a) We first train an organized latent space for mRNA sequences by jointly optimizing reconstruction and property prediction objectives. (b) We then train a manifold projector while the encoder's weights are frozen. (c) For undersampled mRNA manifolds, we use SUGAR to learn key dimensions in the manifold and fill undersampled regions. (d) During optimization, the manifold projector brings off-manifold points back to the manifold. (e) We can use the encoder and the manifold projector to optimize the properties of given input mRNA sequences or interpolate between sequences. Notably, the intermediate products can also be decoded. Best viewed zoomed in.
  • Figure 2: Latent space trajectories of RNAGenScape over 10 optimization steps. The trajectories follow smooth and reasonable paths with steady improvement in the target property. (a) Trajectories in the PHATE space. (b) 2D structures. (c) 3D structures. (d) Intermediate products of various methods midway through optimization (5th step).
  • Figure 3: Latent space interpolation trajectories from 5 sources to 4 targets. Each trajectory is shown as a line fading from bright to dark in a consistent color. RNAGenScape produces smooth and coherent paths on the manifold between arbitrary input-target mRNA pairs.
  • Figure 4: Latent space $\ell_2$ distances during interpolation show smooth and monotonic transition from the source to the target. Results are averaged over all data samples.
  • Figure 5: RNAGenScape optimization is step-efficient and remains stable over a range of optimization steps.
  • ...and 4 more figures