AntibodyFlow: Normalizing Flow Model for Designing Antibody Complementarity-Determining Regions
Bohao Xu, Yanbo Wang, Wenyu Chen, Shimin Shan
TL;DR
AntibodyFlow addresses the challenge of designing 3D antibody CDR loops by representing a loop as a distance matrix $\mathbf{D}$ and amino-acid sequence $\mathbf{S}$, and modeling their joint distribution with a two-phase normalizing flow: $f_{\mathbf{D}}$ generates $\mathbf{D}$ and $f_{\mathbf{S}|\mathbf{D}}$ generates $\mathbf{S}$ conditioned on $\mathbf{D}$. A differentiable constraint-learning component enforces bond-length and open-loop validity, while a constrained coordinate-generation step reconstructs 3D coordinates $\mathbf{G}$ from $\mathbf{D}$ under these geometric constraints. Empirical results on SabDab and CoV-AbDab demonstrate that AntibodyFlow achieves higher validity rates and lower RMSD than baselines, with up to a 16.0% relative VR improvement and 24.3% RMSD reduction, and it yields better SARS-CoV-2 neutralization predictions. Overall, the work shows that combining distance-based geometric priors, conditional sequence generation, and geometry-aware optimization can substantially advance de novo antibody design with practical therapeutic implications.
Abstract
Therapeutic antibodies have been extensively studied in drug discovery and development in the past decades. Antibodies are specialized protective proteins that bind to antigens in a lock-to-key manner. The binding strength/affinity between an antibody and a specific antigen is heavily determined by the complementarity-determining regions (CDRs) on the antibodies. Existing machine learning methods cast in silico development of CDRs as either sequence or 3D graph (with a single chain) generation tasks and have achieved initial success. However, with CDR loops having specific geometry shapes, learning the 3D geometric structures of CDRs remains a challenge. To address this issue, we propose AntibodyFlow, a 3D flow model to design antibody CDR loops. Specifically, AntibodyFlow first constructs the distance matrix, then predicts amino acids conditioned on the distance matrix. Also, AntibodyFlow conducts constraint learning and constrained generation to ensure valid 3D structures. Experimental results indicate that AntibodyFlow outperforms the best baseline consistently with up to 16.0% relative improvement in validity rate and 24.3% relative reduction in geometric graph level error (root mean square deviation, RMSD).
