RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Jin Han; Tianfan Fu; Wu-Jun Li

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Jin Han, Tianfan Fu, Wu-Jun Li

TL;DR

RadDiff addresses protein inverse folding by integrating external evolutionary knowledge through retrieval of structurally similar proteins into a diffusion-based sequence design framework. It constructs a residue-level amino acid profile from residue-wise alignments of retrieved structures and fuses this profile with 3D structure representations via a lightweight integration module, further enhanced by a masked-prior denoising strategy. Empirically, RadDiff achieves state-of-the-art performance on CATH, PDB, and TS50, improving sequence recovery by up to 19% while maintaining parameter efficiency and scalable retrieval as database size grows. This retrieval-augmented diffusion approach enables flexible, up-to-date utilization of growing protein databases without relying on massive PLMs, offering a practical path for evolutionary-informed protein design.

Abstract

Protein inverse folding, the design of an amino acid sequence based on a target 3D structure, is a fundamental problem of computational protein engineering. Existing methods either generate sequences without leveraging external knowledge or relying on protein language models (PLMs). The former omits the evolutionary information stored in protein databases, while the latter is parameter-inefficient and inflexible to adapt to ever-growing protein data. To overcome the above drawbacks, in this paper we propose a novel method, called retrieval-augmented denoising diffusion (RadDiff), for protein inverse folding. Given the target protein backbone, RadDiff uses a hierarchical search strategy to efficiently retrieve structurally similar proteins from large protein databases. The retrieved structures are then aligned residue-by-residue to the target to construct a position-specific amino acid profile, which serves as an evolutionary-informed prior that conditions the denoising process. A lightweight integration module is further designed to incorporate this prior effectively. Experimental results on the CATH, PDB, and TS50 datasets show that RadDiff consistently outperforms existing methods, improving sequence recovery rate by up to 19%. Experimental results also demonstrate that RadDiff generates highly foldable sequences and scales effectively with database size.

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

TL;DR

Abstract

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)