Table of Contents
Fetching ...

Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization

Zichen Wang, Yaokun Ji, Jianing Tian, Shuangjia Zheng

TL;DR

The paper tackles the challenge of designing antibodies that satisfy explicit structural constraints by introducing RADAb, a retrieval-augmented diffusion framework that conditions sequence design on structure-informed templates. RADAb retrieves CDR-like fragments from a structural database using MASTER, then employs a dual-branch diffusion model that combines global geometry context with local homologous CDR-like information to iteratively refine CDR sequences while keeping the backbone fixed; the model is agnostic to the base diffusion architecture. The approach yields state-of-the-art performance on antibody CDR sequence inverse folding and functionality optimization, achieving notable gains in amino acid recovery, structural consistency, and binding energy improvements, including robust results on long CDR-H3 regions and SARS-CoV-2 related cases. These findings highlight the potential of semi-parametric, template-guided diffusion for biomolecular design and offer a path toward more efficient, data-efficient antibody engineering in practice, with future work aimed at broader motif design and reducing retrieval-related risks.

Abstract

Antibodies are essential proteins responsible for immune responses in organisms, capable of specifically recognizing antigen molecules of pathogens. Recent advances in generative models have significantly enhanced rational antibody design. However, existing methods mainly create antibodies from scratch without template constraints, leading to model optimization challenges and unnatural sequences. To address these issues, we propose a retrieval-augmented diffusion framework, termed RADAb, for efficient antibody design. Our method leverages a set of structural homologous motifs that align with query structural constraints to guide the generative model in inversely optimizing antibodies according to desired design criteria. Specifically, we introduce a structure-informed retrieval mechanism that integrates these exemplar motifs with the input backbone through a novel dual-branch denoising module, utilizing both structural and evolutionary information. Additionally, we develop a conditional diffusion model that iteratively refines the optimization process by incorporating both global context and local evolutionary conditions. Our approach is agnostic to the choice of generative models. Empirical experiments demonstrate that our method achieves state-of-the-art performance in multiple antibody inverse folding and optimization tasks, offering a new perspective on biomolecular generative models.

Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization

TL;DR

The paper tackles the challenge of designing antibodies that satisfy explicit structural constraints by introducing RADAb, a retrieval-augmented diffusion framework that conditions sequence design on structure-informed templates. RADAb retrieves CDR-like fragments from a structural database using MASTER, then employs a dual-branch diffusion model that combines global geometry context with local homologous CDR-like information to iteratively refine CDR sequences while keeping the backbone fixed; the model is agnostic to the base diffusion architecture. The approach yields state-of-the-art performance on antibody CDR sequence inverse folding and functionality optimization, achieving notable gains in amino acid recovery, structural consistency, and binding energy improvements, including robust results on long CDR-H3 regions and SARS-CoV-2 related cases. These findings highlight the potential of semi-parametric, template-guided diffusion for biomolecular design and offer a path toward more efficient, data-efficient antibody engineering in practice, with future work aimed at broader motif design and reducing retrieval-related risks.

Abstract

Antibodies are essential proteins responsible for immune responses in organisms, capable of specifically recognizing antigen molecules of pathogens. Recent advances in generative models have significantly enhanced rational antibody design. However, existing methods mainly create antibodies from scratch without template constraints, leading to model optimization challenges and unnatural sequences. To address these issues, we propose a retrieval-augmented diffusion framework, termed RADAb, for efficient antibody design. Our method leverages a set of structural homologous motifs that align with query structural constraints to guide the generative model in inversely optimizing antibodies according to desired design criteria. Specifically, we introduce a structure-informed retrieval mechanism that integrates these exemplar motifs with the input backbone through a novel dual-branch denoising module, utilizing both structural and evolutionary information. Additionally, we develop a conditional diffusion model that iteratively refines the optimization process by incorporating both global context and local evolutionary conditions. Our approach is agnostic to the choice of generative models. Empirical experiments demonstrate that our method achieves state-of-the-art performance in multiple antibody inverse folding and optimization tasks, offering a new perspective on biomolecular generative models.

Paper Structure

This paper contains 31 sections, 8 equations, 6 figures, 5 tables, 3 algorithms.

Figures (6)

  • Figure 1: Illustration of the retrieval-augmented framework.
  • Figure 2: The overall architecture of the proposed RADAb framework. (A) Structural retrieval process, the CDR backbone is input into MASTER and the output is a set of ranked CDR-like fragments. (B) Diffusion process and denoising network which takes antibody-antigen context and retrieved evolutionary information as conditions. The structure is fixed during diffusion process. (C) Our method restricts the antibody to a small region through fixed structural constraints and retrieval-augmented constraints (functional constraints) to achieve higher fitness.
  • Figure 3: Left: Distribution of the samples' interface energy. Right: Generated CDR-H3 samples and the original structure of PDB: 7d6i antigen-antibody complex. The gray part represents the antibody framework, the red part represents the CDR, and the blue part represents the antigen (with the darker shade indicating the antigen epitope).
  • Figure 4: Ablation Study. G represents using Ground truth as retrieval results, R represents the Retrieval-augment mechanism, and E represents Evolutionary embedding mechanism.
  • Figure S1: AAR distribution of different CDRH3 length
  • ...and 1 more figures