Table of Contents
Fetching ...

EquiJump: Protein Dynamics Simulation via SO(3)-Equivariant Stochastic Interpolants

Allan dos Santos Costa, Ilan Mitnikov, Franco Pellegrini, Ameya Daigavane, Mario Geiger, Zhonglin Cao, Karsten Kreis, Tess Smidt, Emine Kucukbenli, Joseph Jacobson

TL;DR

EquiJump tackles the computational burden of all-atom molecular dynamics by introducing a transferable, SO(3)-equivariant framework that uses Two-Sided Stochastic Interpolants to propagate protein conformations across long time steps. By operating directly on 3D all-atom representations with a Tensor Cloud geometric encoding and an equivariant neural network architecture, it learns a time-evolution operator conditioned on current structure, enabling stable, long-horizon dynamics. Across 12 fast-folding proteins, EquiJump demonstrates state-of-the-art accuracy and transferability, outperforming diffusion-based and prior transport methods while delivering significant speedups. The work combines a rigorous stochastic interpolant formulation, geometry-aware representations, and MSM/TICA-based equilibrium evaluation to provide a scalable path toward practical, long-timescale protein dynamics simulation. This approach has the potential to accelerate drug discovery and protein engineering workflows by providing reliable, fast-generation dynamics at the all-atom level.

Abstract

Mapping the conformational dynamics of proteins is crucial for elucidating their functional mechanisms. While Molecular Dynamics (MD) simulation enables detailed time evolution of protein motion, its computational toll hinders its use in practice. To address this challenge, multiple deep learning models for reproducing and accelerating MD have been proposed drawing on transport-based generative methods. However, existing work focuses on generation through transport of samples from prior distributions, that can often be distant from the data manifold. The recently proposed framework of stochastic interpolants, instead, enables transport between arbitrary distribution endpoints. Building upon this work, we introduce EquiJump, a transferable SO(3)-equivariant model that bridges all-atom protein dynamics simulation time steps directly. Our approach unifies diverse sampling methods and is benchmarked against existing models on trajectory data of fast folding proteins. EquiJump achieves state-of-the-art results on dynamics simulation with a transferable model on all of the fast folding proteins.

EquiJump: Protein Dynamics Simulation via SO(3)-Equivariant Stochastic Interpolants

TL;DR

EquiJump tackles the computational burden of all-atom molecular dynamics by introducing a transferable, SO(3)-equivariant framework that uses Two-Sided Stochastic Interpolants to propagate protein conformations across long time steps. By operating directly on 3D all-atom representations with a Tensor Cloud geometric encoding and an equivariant neural network architecture, it learns a time-evolution operator conditioned on current structure, enabling stable, long-horizon dynamics. Across 12 fast-folding proteins, EquiJump demonstrates state-of-the-art accuracy and transferability, outperforming diffusion-based and prior transport methods while delivering significant speedups. The work combines a rigorous stochastic interpolant formulation, geometry-aware representations, and MSM/TICA-based equilibrium evaluation to provide a scalable path toward practical, long-timescale protein dynamics simulation. This approach has the potential to accelerate drug discovery and protein engineering workflows by providing reliable, fast-generation dynamics at the all-atom level.

Abstract

Mapping the conformational dynamics of proteins is crucial for elucidating their functional mechanisms. While Molecular Dynamics (MD) simulation enables detailed time evolution of protein motion, its computational toll hinders its use in practice. To address this challenge, multiple deep learning models for reproducing and accelerating MD have been proposed drawing on transport-based generative methods. However, existing work focuses on generation through transport of samples from prior distributions, that can often be distant from the data manifold. The recently proposed framework of stochastic interpolants, instead, enables transport between arbitrary distribution endpoints. Building upon this work, we introduce EquiJump, a transferable SO(3)-equivariant model that bridges all-atom protein dynamics simulation time steps directly. Our approach unifies diverse sampling methods and is benchmarked against existing models on trajectory data of fast folding proteins. EquiJump achieves state-of-the-art results on dynamics simulation with a transferable model on all of the fast folding proteins.

Paper Structure

This paper contains 33 sections, 23 equations, 16 figures, 4 tables, 5 algorithms.

Figures (16)

  • Figure 1: Direct bridging of 3D Protein Simulation: EquiJump runs an stochastic interpolants-based transport on 3D coordinates and geometric features to generate future time frames from an initial state. Gray boxes depict transport across the latent space, which takes Gaussian perturbations and uses learned noise and drift to transform all-atom proteins across time and 3D space.
  • Figure 2: Neural Transport of Tensor Clouds. (a)DDPM defines an SDE for denoising samples from a Gaussian prior, while standard (b)Flow Matching traces a velocity field-based ODE for moving the Gaussian samples. (c)Two-Sided Stochastic Interpolants instead enable transporting through a local, normally-perturbed latent space that remains close to the manifold of the data.
  • Figure 3: EquiJump Architecture: (a) The Self-Interaction Layer updates geometric features independently, mixing $\mathbf V^l$ of different degrees into new features through a Tensor Square operation. (b) The Spatial Convolution layer updates representations by aggregating the tensor product of neighbors messages with the spherical harmonics embedding of the relative 3D vector between the positions of those neighbors. (c) We stack the above modules to form a block, and build a base network out of $L$ blocks for making predictions. (d) A shared conditioner and 4 headers are built from the base network. The conditioner processes sequence and the current simulation step, producing latent embeddings that are fed to the prediction headers. The headers independently predict features and coordinates updates for drift and noise components of the stochastic process.
  • Figure 4: Protein G and Free Energies on its TIC components for different models of Generative Simulation. (Left) Protein G crystal. (Right) Estimated free energies on the first TIC components for samples produced by DDPM, Flow Matching and Stochastic Interpolants. We observe Two-Sided Interpolants outperform other transport in recovering the TICA profile.
  • Figure 5: EquiJump Samples: (a) We visualize the distribution in 3D of 1500 backbone random samples of EquiJump trajectories. We align samples to the crystal backbone (shown in black) and verify that our model stays close to the native state basin. We show (b) mean pairwise $\mathbf C_\alpha$ distance matrices, (c) Ramachandran plots of backbone dihedrals and (d) Janin plots of sidechain dihedrals of EquiJump samples against reference trajectory data.
  • ...and 11 more figures