Table of Contents
Fetching ...

Structure Language Models for Protein Conformation Generation

Jiarui Lu, Xiaoyin Chen, Stephen Zhewen Lu, Chence Shi, Hongyu Guo, Yoshua Bengio, Jian Tang

TL;DR

This work introduces Structure Language Modeling (SLM) to efficiently generate protein conformations by encoding 3D structures into a discrete latent space and performing conditional language modeling on latent structure tokens. A two-stage training pipeline couples a discrete VAE (dVAE) for structure quantization with a latent prior learned by language models, enabling roto-translation invariant and scalable sampling. The ESMDiff instantiation, based on masked discrete diffusion and fine-tuned from ESM3, demonstrates state-of-the-art performance across equilibrium dynamics, conformational-change pairs, and intrinsically disordered proteins, while delivering 20–100× faster sampling than diffusion-based baselines. The approach reframes conformational sampling as amortized distribution learning in latent space, leveraging modern LM architectures for efficient exploration of diverse ensemble modes and enabling practical applications like structure inpainting with nanobodies. Overall, SLMs offer a flexible, scalable framework to probe protein conformational landscapes with substantial speedups and broad potential extensions to include side chains and ligand contexts.

Abstract

Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with sampling equilibrium conformations and are computationally expensive. Recently, deep generative models have shown promise in generating protein conformations as a more efficient alternative. However, these methods predominantly rely on the diffusion process within a 3D geometric space, which typically centers around the vicinity of metastable states and is often inefficient in terms of runtime. In this paper, we introduce Structure Language Modeling (SLM) as a novel framework for efficient protein conformation generation. Specifically, the protein structures are first encoded into a compact latent space using a discrete variational auto-encoder, followed by conditional language modeling that effectively captures sequence-specific conformation distributions. This enables a more efficient and interpretable exploration of diverse ensemble modes compared to existing methods. Based on this general framework, we instantiate SLM with various popular LM architectures as well as proposing the ESMDiff, a novel BERT-like structure language model fine-tuned from ESM3 with masked diffusion. We verify our approach in various scenarios, including the equilibrium dynamics of BPTI, conformational change pairs, and intrinsically disordered proteins. SLM provides a highly efficient solution, offering a 20-100x speedup than existing methods in generating diverse conformations, shedding light on promising avenues for future research.

Structure Language Models for Protein Conformation Generation

TL;DR

This work introduces Structure Language Modeling (SLM) to efficiently generate protein conformations by encoding 3D structures into a discrete latent space and performing conditional language modeling on latent structure tokens. A two-stage training pipeline couples a discrete VAE (dVAE) for structure quantization with a latent prior learned by language models, enabling roto-translation invariant and scalable sampling. The ESMDiff instantiation, based on masked discrete diffusion and fine-tuned from ESM3, demonstrates state-of-the-art performance across equilibrium dynamics, conformational-change pairs, and intrinsically disordered proteins, while delivering 20–100× faster sampling than diffusion-based baselines. The approach reframes conformational sampling as amortized distribution learning in latent space, leveraging modern LM architectures for efficient exploration of diverse ensemble modes and enabling practical applications like structure inpainting with nanobodies. Overall, SLMs offer a flexible, scalable framework to probe protein conformational landscapes with substantial speedups and broad potential extensions to include side chains and ligand contexts.

Abstract

Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with sampling equilibrium conformations and are computationally expensive. Recently, deep generative models have shown promise in generating protein conformations as a more efficient alternative. However, these methods predominantly rely on the diffusion process within a 3D geometric space, which typically centers around the vicinity of metastable states and is often inefficient in terms of runtime. In this paper, we introduce Structure Language Modeling (SLM) as a novel framework for efficient protein conformation generation. Specifically, the protein structures are first encoded into a compact latent space using a discrete variational auto-encoder, followed by conditional language modeling that effectively captures sequence-specific conformation distributions. This enables a more efficient and interpretable exploration of diverse ensemble modes compared to existing methods. Based on this general framework, we instantiate SLM with various popular LM architectures as well as proposing the ESMDiff, a novel BERT-like structure language model fine-tuned from ESM3 with masked diffusion. We verify our approach in various scenarios, including the equilibrium dynamics of BPTI, conformational change pairs, and intrinsically disordered proteins. SLM provides a highly efficient solution, offering a 20-100x speedup than existing methods in generating diverse conformations, shedding light on promising avenues for future research.

Paper Structure

This paper contains 65 sections, 32 equations, 14 figures, 10 tables, 5 algorithms.

Figures (14)

  • Figure 1: Residue flexibility (BPTI clusters, shaw2010atomic) reflected by the categorical distribution over latent structure tokens. Different tokens (colored in different shades) are used to encode varying local structural patterns.
  • Figure 2: An illustration of the proposed SLM framework.
  • Figure 3: Autoregressive prior modeling the for latent structure tokens discussed in Section \ref{['sec:cslm']}.
  • Figure 4: Illustration for conditional denoising network ${\bm{u}}_\theta({\bm{z}}_t, {\bm{c}})$ where the masked (structure) tokens are colored grey. The unmasked tokens are carried over to the output without update, while the masked tokens are under random transition into either unmasked (✓) or still masked (✗) state.
  • Figure 5: Runtime profiling for SLMs and other baseline methods. The number of parameters and necessary configurations for each model are also remarked for better reference.
  • ...and 9 more figures