Table of Contents
Fetching ...

ConfRover: Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression

Yuning Shen, Lihao Wang, Huizhuo Yuan, Yan Wang, Bangji Yang, Quanquan Gu

TL;DR

ConfRover introduces an autoregressive framework that jointly learns protein conformational distributions and dynamics from MD trajectories, enabling both time-dependent trajectory generation and time-independent sampling within a single model. The architecture combines a protein-aware encoder, a causal latent trajectory module, and an SE(3) diffusion-based structure decoder to generate continuous-space conformations frame-by-frame. Empirical results on the ATLAS dataset show ConfRover outperforms state-of-the-art trajectory models in capturing dynamic magnitudes and state transitions, while achieving competitive quality in time-independent sampling and enabling interpolation between end states with ConfRover-interp. The approach offers a flexible, scalable path toward unified modeling of protein dynamics that can accelerate exploration of conformational spaces and dynamics, potentially speeding up protein design and understanding of functional motions.

Abstract

Understanding protein dynamics is critical for elucidating their biological functions. The increasing availability of molecular dynamics (MD) data enables the training of deep generative models to efficiently explore the conformational space of proteins. However, existing approaches either fail to explicitly capture the temporal dependencies between conformations or do not support direct generation of time-independent samples. To address these limitations, we introduce ConfRover, an autoregressive model that simultaneously learns protein conformation and dynamics from MD trajectories, supporting both time-dependent and time-independent sampling. At the core of our model is a modular architecture comprising: (i) an encoding layer, adapted from protein folding models, that embeds protein-specific information and conformation at each time frame into a latent space; (ii) a temporal module, a sequence model that captures conformational dynamics across frames; and (iii) an SE(3) diffusion model as the structure decoder, generating conformations in continuous space. Experiments on ATLAS, a large-scale protein MD dataset of diverse structures, demonstrate the effectiveness of our model in learning conformational dynamics and supporting a wide range of downstream tasks. ConfRover is the first model to sample both protein conformations and trajectories within a single framework, offering a novel and flexible approach for learning from protein MD data. Project website: https://bytedance-seed.github.io/ConfRover.

ConfRover: Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression

TL;DR

ConfRover introduces an autoregressive framework that jointly learns protein conformational distributions and dynamics from MD trajectories, enabling both time-dependent trajectory generation and time-independent sampling within a single model. The architecture combines a protein-aware encoder, a causal latent trajectory module, and an SE(3) diffusion-based structure decoder to generate continuous-space conformations frame-by-frame. Empirical results on the ATLAS dataset show ConfRover outperforms state-of-the-art trajectory models in capturing dynamic magnitudes and state transitions, while achieving competitive quality in time-independent sampling and enabling interpolation between end states with ConfRover-interp. The approach offers a flexible, scalable path toward unified modeling of protein dynamics that can accelerate exploration of conformational spaces and dynamics, potentially speeding up protein design and understanding of functional motions.

Abstract

Understanding protein dynamics is critical for elucidating their biological functions. The increasing availability of molecular dynamics (MD) data enables the training of deep generative models to efficiently explore the conformational space of proteins. However, existing approaches either fail to explicitly capture the temporal dependencies between conformations or do not support direct generation of time-independent samples. To address these limitations, we introduce ConfRover, an autoregressive model that simultaneously learns protein conformation and dynamics from MD trajectories, supporting both time-dependent and time-independent sampling. At the core of our model is a modular architecture comprising: (i) an encoding layer, adapted from protein folding models, that embeds protein-specific information and conformation at each time frame into a latent space; (ii) a temporal module, a sequence model that captures conformational dynamics across frames; and (iii) an SE(3) diffusion model as the structure decoder, generating conformations in continuous space. Experiments on ATLAS, a large-scale protein MD dataset of diverse structures, demonstrate the effectiveness of our model in learning conformational dynamics and supporting a wide range of downstream tasks. ConfRover is the first model to sample both protein conformations and trajectories within a single framework, offering a novel and flexible approach for learning from protein MD data. Project website: https://bytedance-seed.github.io/ConfRover.

Paper Structure

This paper contains 49 sections, 15 equations, 17 figures, 19 tables, 1 algorithm.

Figures (17)

  • Figure 1: Key ideas of ConfRover. (A) Conformation generation tasks with various conditioning configurations. Each block denotes a frame and arrows indicates the sequential dependencies among frames from autoregressive formulation. Initial conditioning frames are outlined in black. In conformation interpolation, the last frame is repositioned and prepended to the first frame for proper sequential dependencies. (B) ConfRover models each frame as a conditional distribution given preceding frames. Sequential dependencies are captured through latent variables $\mathbf{h}$, and conformations are sampled from a diffusion decoder, conditioned on the updated latent.
  • Figure 2: Causal sequence model to generate trajectory ($\hat{\mathbf{x}}^2, \dots$) from the mask token "[M]" and the conditioning frame $\mathbf{x}^1$. Each frame only attend to its previous frames. Attention activations for $\hat{\mathbf{x}}^3$ are highlighted in orange.
  • Figure 3: Architecture overview. (A) Encoding Layer embeds protein sequence and input structure to each frame as a frame latent representation $\mathbf{h}^l$, comprised of single and pair embeddings; (B) The Trajectory Module then updates frame latent $\mathbf{h}^l$ using interleaved structural and temporal update blocks; (C) A diffusion-based Structure Decoder learns to denoise noisy conformations conditioned on the updated frame latent $\mathbf{h}^l$; during inference, it samples conformations from the prior distribution. See Appendix \ref{['ap:arch_details']} for details.
  • Figure 4: Visualization of six proteins from multi-start. Trajectory conformations are colored by their secondary structures and superposed to show the dynamic ensemble. MDGen primarily exhibits local movements, whereas ConfRover captures conformations changes similar to MD simulations.
  • Figure 5: Results from 100 ns simulation. (A) Correlations of principal dynamic modes between sample and reference trajectories, evaluated at varying lag time. The mean and standard deviation are shown as line and shadowed area, computed from five individual runs for MDGen and ConfRover. (B) Examples trajectories illustrating the states explored by different methods (downsampled by 5 frames for visualization). The blue background indicates the density of the ground-truth conformation distribution from MD reference.
  • ...and 12 more figures