Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization
Zichen Wang, Yaokun Ji, Jianing Tian, Shuangjia Zheng
TL;DR
The paper tackles the challenge of designing antibodies that satisfy explicit structural constraints by introducing RADAb, a retrieval-augmented diffusion framework that conditions sequence design on structure-informed templates. RADAb retrieves CDR-like fragments from a structural database using MASTER, then employs a dual-branch diffusion model that combines global geometry context with local homologous CDR-like information to iteratively refine CDR sequences while keeping the backbone fixed; the model is agnostic to the base diffusion architecture. The approach yields state-of-the-art performance on antibody CDR sequence inverse folding and functionality optimization, achieving notable gains in amino acid recovery, structural consistency, and binding energy improvements, including robust results on long CDR-H3 regions and SARS-CoV-2 related cases. These findings highlight the potential of semi-parametric, template-guided diffusion for biomolecular design and offer a path toward more efficient, data-efficient antibody engineering in practice, with future work aimed at broader motif design and reducing retrieval-related risks.
Abstract
Antibodies are essential proteins responsible for immune responses in organisms, capable of specifically recognizing antigen molecules of pathogens. Recent advances in generative models have significantly enhanced rational antibody design. However, existing methods mainly create antibodies from scratch without template constraints, leading to model optimization challenges and unnatural sequences. To address these issues, we propose a retrieval-augmented diffusion framework, termed RADAb, for efficient antibody design. Our method leverages a set of structural homologous motifs that align with query structural constraints to guide the generative model in inversely optimizing antibodies according to desired design criteria. Specifically, we introduce a structure-informed retrieval mechanism that integrates these exemplar motifs with the input backbone through a novel dual-branch denoising module, utilizing both structural and evolutionary information. Additionally, we develop a conditional diffusion model that iteratively refines the optimization process by incorporating both global context and local evolutionary conditions. Our approach is agnostic to the choice of generative models. Empirical experiments demonstrate that our method achieves state-of-the-art performance in multiple antibody inverse folding and optimization tasks, offering a new perspective on biomolecular generative models.
