Improving AlphaFlow for Efficient Protein Ensembles Generation

Shaoning Li; Mingyu Li; Yusong Wang; Xinheng He; Nanning Zheng; Jian Zhang; Pheng-Ann Heng

Improving AlphaFlow for Efficient Protein Ensembles Generation

Shaoning Li, Mingyu Li, Yusong Wang, Xinheng He, Nanning Zheng, Jian Zhang, Pheng-Ann Heng

TL;DR

This work tackles the high computational cost of generating protein conformational ensembles with flow-based approaches. It introduces AlphaFlow-Lit, a feature-conditioned, light-weight variant that freezes the Evoformer and relies on precomputed single/pair features to accelerate sampling, achieving approximately $47\times$ speedup while maintaining performance comparable to AlphaFlow. The authors validate the method on ALTAS MD trajectories, showing that AlphaFlow-Lit preserves essential and global dynamics similar to MD and outperforms the distilled variant in diversity and correlation metrics, while offering superior runtime scalability. The approach significantly enhances the practicality of dense protein ensemble generation, enabling faster exploration of conformational landscapes and enabling large-scale analyses of dynamics and long-range couplings with deep learning tools.

Abstract

Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still requires multiple runs of AlphaFold to finally generate one single conformation. Due to the heavy consumption of AlphaFold, its applicability is limited in sampling larger set of protein ensembles or the longer chains within a constrained timeframe. In this work, we propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation. In contrast to the full fine-tuning on the entire structure, we focus solely on the light-weight structure module to reconstruct the conformation. AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times. The advancement in efficiency showcases the potential of AlphaFlow-Lit in enabling faster and more scalable generation of protein ensembles.

Improving AlphaFlow for Efficient Protein Ensembles Generation

TL;DR

speedup while maintaining performance comparable to AlphaFlow. The authors validate the method on ALTAS MD trajectories, showing that AlphaFlow-Lit preserves essential and global dynamics similar to MD and outperforms the distilled variant in diversity and correlation metrics, while offering superior runtime scalability. The approach significantly enhances the practicality of dense protein ensemble generation, enabling faster exploration of conformational landscapes and enabling large-scale analyses of dynamics and long-range couplings with deep learning tools.

Abstract

Paper Structure (14 sections, 4 equations, 2 figures, 2 tables, 2 algorithms)

This paper contains 14 sections, 4 equations, 2 figures, 2 tables, 2 algorithms.

Introduction
Preliminary
Flow matching
AlphaFlow
Method
Experiments
Runtime comparison
Protein dynamics analysis
Local arrangements analysis
Long-range correlations analysis
Conclusion
Limitation and future work
Method Details.
Runtime comparison

Figures (2)

Figure 1: Model architecture of sequence-conditioned AlphaFlow (left) and feature-conditioned AlphaFlow-Lit (right). $T$: Denoising steps; $x_t$: Noisy structure; $\tilde{x}_0$: Predicted structure.
Figure 2: Visualization of MD evaluation from MD, AlphaFlow-Lit and AlphaFlow. (A) Runtime comparison corresponding to the sequence length and their fitted curves. (B) Principal components analysis (PCA) for 6q9c_A ensembles. The representative structures are pointed out. (C, D) Ensembles of PDB ID 7buy_A with C$\alpha$ RMSF by residue index shown in insets, and their Dynamic cross-correlation matrix (DCCM).

Improving AlphaFlow for Efficient Protein Ensembles Generation

TL;DR

Abstract

Improving AlphaFlow for Efficient Protein Ensembles Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)