Table of Contents
Fetching ...

ProteinAE: Protein Diffusion Autoencoders for Structure Encoding

Shaoning Li, Le Zhuo, Yusong Wang, Mingyu Li, Xinheng He, Fandi Wu, Hongsheng Li, Pheng-Ann Heng

TL;DR

ProteinAE introduces a non-equivariant diffusion autoencoder that maps backbone coordinates directly in $E(3)$ to a continuous latent $z$, avoiding the complexities of SE($3$) equivariance and discrete tokenization. Built on Diffusion Transformer blocks with All-Atom Attention and a bottleneck architecture, it uses a single flow-matching objective, enabling an efficient latent diffusion model (PLDM) for high-quality structure generation. The learned latent space supports accurate downstream physicochemical predictions and competitive unconditional generation compared with leading structure-based methods, while drastically reducing sampling time and memory. This approach simplifies protein structure modeling, offering scalable, high-fidelity reconstruction and generation with practical implications for design and analysis, and code is publicly available.

Abstract

Developing effective representations of protein structures is essential for advancing protein science, particularly for protein generative modeling. Current approaches often grapple with the complexities of the SE(3) manifold, rely on discrete tokenization, or the need for multiple training objectives, all of which can hinder the model optimization and generalization. We introduce ProteinAE, a novel and streamlined protein diffusion autoencoder designed to overcome these challenges by directly mapping protein backbone coordinates from E(3) into a continuous, compact latent space. ProteinAE employs a non-equivariant Diffusion Transformer with a bottleneck design for efficient compression and is trained end-to-end with a single flow matching objective, substantially simplifying the optimization pipeline. We demonstrate that ProteinAE achieves state-of-the-art reconstruction quality, outperforming existing autoencoders. The resulting latent space serves as a powerful foundation for a latent diffusion model that bypasses the need for explicit equivariance. This enables efficient, high-quality structure generation that is competitive with leading structure-based approaches and significantly outperforms prior latent-based methods. Code is available at https://github.com/OnlyLoveKFC/ProteinAE_v1.

ProteinAE: Protein Diffusion Autoencoders for Structure Encoding

TL;DR

ProteinAE introduces a non-equivariant diffusion autoencoder that maps backbone coordinates directly in to a continuous latent , avoiding the complexities of SE() equivariance and discrete tokenization. Built on Diffusion Transformer blocks with All-Atom Attention and a bottleneck architecture, it uses a single flow-matching objective, enabling an efficient latent diffusion model (PLDM) for high-quality structure generation. The learned latent space supports accurate downstream physicochemical predictions and competitive unconditional generation compared with leading structure-based methods, while drastically reducing sampling time and memory. This approach simplifies protein structure modeling, offering scalable, high-fidelity reconstruction and generation with practical implications for design and analysis, and code is publicly available.

Abstract

Developing effective representations of protein structures is essential for advancing protein science, particularly for protein generative modeling. Current approaches often grapple with the complexities of the SE(3) manifold, rely on discrete tokenization, or the need for multiple training objectives, all of which can hinder the model optimization and generalization. We introduce ProteinAE, a novel and streamlined protein diffusion autoencoder designed to overcome these challenges by directly mapping protein backbone coordinates from E(3) into a continuous, compact latent space. ProteinAE employs a non-equivariant Diffusion Transformer with a bottleneck design for efficient compression and is trained end-to-end with a single flow matching objective, substantially simplifying the optimization pipeline. We demonstrate that ProteinAE achieves state-of-the-art reconstruction quality, outperforming existing autoencoders. The resulting latent space serves as a powerful foundation for a latent diffusion model that bypasses the need for explicit equivariance. This enables efficient, high-quality structure generation that is competitive with leading structure-based approaches and significantly outperforms prior latent-based methods. Code is available at https://github.com/OnlyLoveKFC/ProteinAE_v1.

Paper Structure

This paper contains 41 sections, 13 equations, 4 figures, 6 tables, 5 algorithms.

Figures (4)

  • Figure 1: Comparison of ESM3 VQ-VAE and our ProteinAE.
  • Figure 2: Overall architecture of ProteinAE. (a) The encoder maps a protein structure to a latent representation $z$; (b) The flow decoder predicts the velocity field $v_t^\theta$ for structure reconstruction conditioned on $z$; (c) Downstream tasks like PLDM operating over the learned latent space.
  • Figure 3: Summary of ProteinAE reconstruction, generation, and model architecture analysis. (A) Visual comparison of protein structure reconstruction quality between ProteinAE and ESM3 VQ-VAE. (B) Ablation studies on key architectural components: the impact of dimension and length bottlenecks, evaluation of model scalability with increased parameters, and analysis of using DiT registers for latent compression. (C) Generation efficiency comparison of ProteinAE-PLDM, RFDiffusion, and DPLM-2, including average sampling time and GPU memory usage. (D) Visual examples of protein backbone structures generated unconditionally by ProteinAE-PLDM.
  • Figure 4: Illustration of structural collapse observed during protein generation by PLDM under length bottleneck conditions.