Constraint Decoupled Latent Diffusion for Protein Backmapping
Xu Han, Yuancheng Sun, Kai Chen, Yuxuan Ren, Kang Liu, Qiwei Ye
TL;DR
CODLAD introduces a two-stage latent diffusion framework that decouples structural constraint handling from generation to backmap coarse-grained protein structures to all-atom detail. By encoding AA structures into discrete, constraint-informed latent representations via a dual-level SE(3)-equivariant GNN and then performing diffusion in this latent space conditioned on CG inputs, CODLAD achieves superior atomistic accuracy ($RMSD$), topological fidelity ($GED$), and conformational diversity ($DIV$) with substantial inference speedups. Across datasets including PED, ATLAS, PDB, and DES, the method demonstrates strong generalization to unseen trajectory systems, and ablation studies confirm the gains stem from constraint decoupling, discrete latent spaces, and latent-space diffusion. The work offers a scalable, generalizable pathway for accurate and diverse backbone-to-all-atom reconstructions, with code and resources publicly available for broader application.
Abstract
Coarse-grained (CG) molecular dynamics simulations enable efficient exploration of protein conformational ensembles. However, reconstructing atomic details from CG structures (backmapping) remains a challenging problem. Current approaches face an inherent trade-off between maintaining atomistic accuracy and exploring diverse conformations, often necessitating complex constraint handling or extensive refinement steps. To address these challenges, we introduce a novel two-stage framework, named CODLAD (COnstraint Decoupled LAtent Diffusion). This framework first compresses atomic structures into discrete latent representations, explicitly embedding structural constraints, thereby decoupling constraint handling from generation. Subsequently, it performs efficient denoising diffusion in this latent space to produce structurally valid and diverse all-atom conformations. Comprehensive evaluations on diverse protein datasets demonstrate that CODLAD achieves state-of-the-art performance in atomistic accuracy, conformational diversity, and computational efficiency while exhibiting strong generalization across different protein systems. Code is available at https://github.com/xiaoxiaokuye/CODLAD.
