Table of Contents
Fetching ...

Latent Retrieval Augmented Generation of Cross-Domain Protein Binders

Zishen Zhang, Xiangzhe Kong, Wenbing Huang, Yang Liu

TL;DR

This work addresses the challenge of designing site-specific protein binders by marrying retrieval of known interfaces with generative design. It introduces RADiAnce, a retrieval-augmented diffusion framework that operates in a contrastive, cross-domain latent space to align binder and interface representations and guide generation with retrieved motifs. The approach delivers superior performance in both peptide and antibody codesign, demonstrates cross-domain benefits, and shows potential for de novo binder design without bound structures, highlighting practical impact for drug discovery. The study also analyzes retrieval quantity, proposes adaptive retrieval, and discusses reproducibility and data-split integrity to ensure robust benchmarking.

Abstract

Designing protein binders targeting specific sites, which requires to generate realistic and functional interaction patterns, is a fundamental challenge in drug discovery. Current structure-based generative models are limited in generating nterfaces with sufficient rationality and interpretability. In this paper, we propose Retrieval-Augmented Diffusion for Aligned interface (RADiAnce), a new framework that leverages known interfaces to guide the design of novel binders. By unifying retrieval and generation in a shared contrastive latent space, our model efficiently identifies relevant interfaces for a given binding site and seamlessly integrates them through a conditional latent diffusion generator, enabling cross-domain interface transfer. Extensive exeriments show that RADiAnce significantly outperforms baseline models across multiple metrics, including binding affinity and recovery of geometries and interactions. Additional experimental results validate cross-domain generalization, demonstrating that retrieving interfaces from diverse domains, such as peptides, antibodies, and protein fragments, enhances the generation performance of binders for other domains. Our work establishes a new paradigm for protein binder design that successfully bridges retrieval-based knowledge and generative AI, opening new possibilities for drug discovery.

Latent Retrieval Augmented Generation of Cross-Domain Protein Binders

TL;DR

This work addresses the challenge of designing site-specific protein binders by marrying retrieval of known interfaces with generative design. It introduces RADiAnce, a retrieval-augmented diffusion framework that operates in a contrastive, cross-domain latent space to align binder and interface representations and guide generation with retrieved motifs. The approach delivers superior performance in both peptide and antibody codesign, demonstrates cross-domain benefits, and shows potential for de novo binder design without bound structures, highlighting practical impact for drug discovery. The study also analyzes retrieval quantity, proposes adaptive retrieval, and discusses reproducibility and data-split integrity to ensure robust benchmarking.

Abstract

Designing protein binders targeting specific sites, which requires to generate realistic and functional interaction patterns, is a fundamental challenge in drug discovery. Current structure-based generative models are limited in generating nterfaces with sufficient rationality and interpretability. In this paper, we propose Retrieval-Augmented Diffusion for Aligned interface (RADiAnce), a new framework that leverages known interfaces to guide the design of novel binders. By unifying retrieval and generation in a shared contrastive latent space, our model efficiently identifies relevant interfaces for a given binding site and seamlessly integrates them through a conditional latent diffusion generator, enabling cross-domain interface transfer. Extensive exeriments show that RADiAnce significantly outperforms baseline models across multiple metrics, including binding affinity and recovery of geometries and interactions. Additional experimental results validate cross-domain generalization, demonstrating that retrieving interfaces from diverse domains, such as peptides, antibodies, and protein fragments, enhances the generation performance of binders for other domains. Our work establishes a new paradigm for protein binder design that successfully bridges retrieval-based knowledge and generative AI, opening new possibilities for drug discovery.

Paper Structure

This paper contains 59 sections, 14 equations, 6 figures, 15 tables, 1 algorithm.

Figures (6)

  • Figure 1: Visualization of interface similarity across antibodies, proteins, and peptides, highlighting similar interaction patterns among diverse binder types.
  • Figure 2: Overview of RADiAnce.(A) Cross-domain binding sites and binders are encoded into key/value latents and trained with a contrastive loss to derive a retrievable binder database. (B) Contrastive VAE aligns binding site and binder latents for accurate retrieval and conditional diffusion. (C) Conditional diffusion generator leverages the retrieved latents through cross-attention at every reverse step, progressively refining noisy features into sterically and chemically consistent complexes.
  • Figure 3: Free energy changes during iterative CDR design.
  • Figure 4: Examples of de novo antibody designs targeting the HIV-1 receptor CD4. Each case shows the final docked complex with the redesigned antibody interacting with the target epitope.
  • Figure 5: Interaction overlap analysis.
  • ...and 1 more figures