Table of Contents
Fetching ...

Distilled Protein Backbone Generation

Liyang Xie, Haoran Zhang, Zhendong Wang, Wesley Tansey, Mingyuan Zhou

TL;DR

The paper tackles the slow sampling inherent in diffusion- and flow-based protein backbone generation. It adapts Score Identity Distillation (SiD) to protein backbones by deriving a flow-matching distillation objective and introducing a generator-score network to align teacher and generator scores, then extends this to few-step generation with inference-time noise scaling. The distilled multistep generators (typically 16–20 steps) achieve more than a 20-fold reduction in sampling time while maintaining designability, diversity, and novelty comparable to the Proteína teacher, with one-step distillation proving ineffective due to designability issues. A fold-class conditioning study and a biological plausibility case demonstrate practical applicability, suggesting this approach enables large-scale in silico protein design and tighter integration with iterative generate–test cycles.

Abstract

Diffusion- and flow-based generative models have recently demonstrated strong performance in protein backbone generation tasks, offering unprecedented capabilities for de novo protein design. However, while achieving notable performance in generation quality, these models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process. This computational bottleneck limits their practical utility in large-scale protein discovery, where thousands to millions of candidate structures are needed. To address this challenge, we explore the techniques of score distillation, which has shown great success in reducing the number of sampling steps in the vision domain while maintaining high generation quality. However, a straightforward adaptation of these methods results in unacceptably low designability. Through extensive study, we have identified how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators which significantly reduce sampling time, while maintaining comparable performance to their pretrained teacher model. In particular, multistep generation combined with inference time noise modulation is key to the success. We demonstrate that our distilled few-step generators achieve more than a 20-fold improvement in sampling speed, while achieving similar levels of designability, diversity, and novelty as the Proteina teacher model. This reduction in inference cost enables large-scale in silico protein design, thereby bringing diffusion-based models closer to real-world protein engineering applications. The PyTorch implementation is available at https://github.com/LY-Xie/SiD_Protein

Distilled Protein Backbone Generation

TL;DR

The paper tackles the slow sampling inherent in diffusion- and flow-based protein backbone generation. It adapts Score Identity Distillation (SiD) to protein backbones by deriving a flow-matching distillation objective and introducing a generator-score network to align teacher and generator scores, then extends this to few-step generation with inference-time noise scaling. The distilled multistep generators (typically 16–20 steps) achieve more than a 20-fold reduction in sampling time while maintaining designability, diversity, and novelty comparable to the Proteína teacher, with one-step distillation proving ineffective due to designability issues. A fold-class conditioning study and a biological plausibility case demonstrate practical applicability, suggesting this approach enables large-scale in silico protein design and tighter integration with iterative generate–test cycles.

Abstract

Diffusion- and flow-based generative models have recently demonstrated strong performance in protein backbone generation tasks, offering unprecedented capabilities for de novo protein design. However, while achieving notable performance in generation quality, these models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process. This computational bottleneck limits their practical utility in large-scale protein discovery, where thousands to millions of candidate structures are needed. To address this challenge, we explore the techniques of score distillation, which has shown great success in reducing the number of sampling steps in the vision domain while maintaining high generation quality. However, a straightforward adaptation of these methods results in unacceptably low designability. Through extensive study, we have identified how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators which significantly reduce sampling time, while maintaining comparable performance to their pretrained teacher model. In particular, multistep generation combined with inference time noise modulation is key to the success. We demonstrate that our distilled few-step generators achieve more than a 20-fold improvement in sampling speed, while achieving similar levels of designability, diversity, and novelty as the Proteina teacher model. This reduction in inference cost enables large-scale in silico protein design, thereby bringing diffusion-based models closer to real-world protein engineering applications. The PyTorch implementation is available at https://github.com/LY-Xie/SiD_Protein

Paper Structure

This paper contains 22 sections, 17 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Plots of designability and diversity versus the number of generation steps. On the left, we show the designability as the percentage of generated samples that meet the designable threshold (scRMSD $<$ 2). On the right, we show the diversity as the average pairwise TM-scores between designable samples. The lower the average TM-score is, the more diverse the generated samples are. 16 steps seems enough to beat the pretrained model in designability, while being slightly worse in diversity.
  • Figure 2: Designed protein backbone and adjacent polar–hydrophobic pocket pair. (a) Front view of the designed protein backbone. (b) Cavity view highlighting two adjacent but chemically distinct pockets: Pocket 1 (red, polar) and Pocket 2 (orange, hydrophobic). The complementary polarities enable two use modes: simultaneous binding of a polar and a hydrophobic ligand, or one bifunctional molecule spanning both pockets.
  • Figure 3: An illustration of the effects of the noise scale on the designability (scRMSD) and diversity (TM) of our 10-step generator. For designability, it is the fraction of samples in the batch with scRMSD $<$ 2Å. The higher the designability the better. For the diversity metric, it is measuring the average similarity (TM-score) within the batch. Lower values indicate more diversity. Setting the noise scale to 1 is the standard way of sampling in diffusion and flow matching models for images, which results in close-to-0 designability for protein structures. The designability is the highest for noise scales around 0.45 while the best diversity is reached at 0.8.
  • Figure 4: Ablation study of $\alpha$. Each plot illustrates the generation quality, measured in the average scTM, average scRMSD, and scRMSD-based designability within each generated batch against the number of training samples processed during distillation across different $\alpha$ values. The ablation study showcases the effect of $\alpha$ on distillation training and highlights the reason why $\alpha$ is set to 1.0 for subsequent experiments.
  • Figure 5: Effective Sampling Time against the number of generator steps. 500 batches generated in batches of 10 on an A6000-48GB GPU.
  • ...and 2 more figures