Table of Contents
Fetching ...

De novo antibody design with SE(3) diffusion

Daniel Cutting, Frédéric A. Dreyer, David Errington, Constantin Schneider, Charlotte M. Deane

TL;DR

IgDiff introduces a SE(3) diffusion model for antibody backbone design, extending prior backbone diffusion methods to jointly generate paired heavy/light chain variable domains. Trained on ABB2-predicted structures from the Observed Antibody Space, IgDiff learns to produce designable, novel backbones and can predict compatible sequences via AbMPNN, enabling both unconditional and condition-specific design tasks. Across unconditioned and conditioned tasks, IgDiff demonstrates improved designability, CDR diversity, and canonical-cluster fidelity compared to RFDiffusion, with multiple experimentally validated antibodies expressed at high yield. The work highlights the viability of end-to-end, structure-based diffusion for antibody engineering and its potential to accelerate therapeutic antibody design and optimization.

Abstract

We introduce IgDiff, an antibody variable domain diffusion model based on a general protein backbone diffusion framework which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that IgDiff produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.

De novo antibody design with SE(3) diffusion

TL;DR

IgDiff introduces a SE(3) diffusion model for antibody backbone design, extending prior backbone diffusion methods to jointly generate paired heavy/light chain variable domains. Trained on ABB2-predicted structures from the Observed Antibody Space, IgDiff learns to produce designable, novel backbones and can predict compatible sequences via AbMPNN, enabling both unconditional and condition-specific design tasks. Across unconditioned and conditioned tasks, IgDiff demonstrates improved designability, CDR diversity, and canonical-cluster fidelity compared to RFDiffusion, with multiple experimentally validated antibodies expressed at high yield. The work highlights the viability of end-to-end, structure-based diffusion for antibody engineering and its potential to accelerate therapeutic antibody design and optimization.

Abstract

We introduce IgDiff, an antibody variable domain diffusion model based on a general protein backbone diffusion framework which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that IgDiff produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.
Paper Structure (6 sections, 5 equations, 4 figures)

This paper contains 6 sections, 5 equations, 4 figures.

Figures (4)

  • Figure 1: Left: Schematic representation of an antibody. Centre: Backbone atoms of the variable region, showing both a heavy (green) and light (blue) chain domain. Right: The parametrisation of residues into frames used by the diffusion model, with each frame consisting of four heavy atoms connected by rigid covalent bonds.
  • Figure 2: Examples of IgDiff generated antibody structures. Light chains are highlighted in blue, heavy chains in green. (A-B) Unconditionally generated antibodies. (A)IgDiff generated antibody (dark green/dark blue) compared to ABodyBuilder2 prediction on the lowest self-consistency RMSD sequence predicted by AbMPNN (light green/cyan) and the closest match in the training set by TM-score (grey). (B) Multiple antibodies generated by IgDiff with the same heavy and light chain length settings. (C-E) Conditionally generated antibodies, designed regions in orange, non-designed regions in dark blue/dark green, original input structure in cyan/light green. (C) Conditional generation of all CDR loops. (D) Conditional generation of CDR H3 with a different length compared to the input structure. (E) Conditional generation of the entire light chain.
  • Figure 3: Left: Ramachandran plot of the dihedral angle distribution comparing the heavy chain residues from the predicted structures of OAS to IgDiff. Right: Same comparison for the light chain.
  • Figure 4: Left: Novelty of generated CDR H3 structures compared to random samples from the OAS dataset. For each structure, the RMSD shown is the CDR H3 RMSD to the closest match in the OAS dataset by TM-score with the same CDR H3 length. Right: Pairwise RMSD between IgDiff generated CDR loops of the same length. We consider also the diversity of antibodies generated by IgDiff, an important consideration for antibody engineering. Taking CDRs with the same loop length, regardless of specified chain length, we calculate the pairwise RMSD between loops. This is shown in Fig. \ref{['fig:novelty-diversity']} (right), where we can observe diversity of generated CDR loops across different samples. Interestingly, we observe a bimodal distribution of pairwise RMSDs for CDRs L1, L3 and H2, arising from two generated loops either assuming the same or different canonical conformations. Further studies of the distribution of loop lengths as a function of specified chain length and exploration of sequence diversity are discussed in Appendix \ref{['app:unconditioned']}. In order to demonstrate that IgDiff generated antibodies are able to express and exhibit favorable designability characteristics, we selected 28 antibody sequences for experimental validation. Prior to selection we first excluded any of the sequences that had higher than the median CDRH3 scRMSD or higher than the median overall scRMSD. As detailed in Appendix \ref{['app:conditioned']} all selected antibodies expressed and yielded sufficiently high concentration for downstream characterisation.Targeted design of specific regions of an antibody is of interest in a number of important applications. The engineering of the interface region, notably the CDR loops, can be of particular relevance for therapeutic applications greiffabsci. Furthermore, the pairing of an appropriate heavy or light chain to a given domain is of importance in antibody discovery and in shaping antibody repertoires pairing. We thus consider the performance of IgDiff on conditional sampling for several inpainting design tasks of particular relevance to antibody engineering. The tasks we consider are (i) the design of all CDR loops given a fixed heavy and light chain framework, (ii) the design of a complete light chain given a heavy chain, and (iii) the design of a CDRH3 loop with varying length given the remaining variable region, thus allowing for additional contact interactions from the longer loop. We generate 10 structures per reference structure and CDR length. All starting point antibodies are taken from the ABB2 test set. In order to perform inpainting, at each time $t$ during inference we replace each frame in the fixed region with the corresponding frame from the reference structure after applying the forward noising process to time $t$. In table \ref{['tab:conditioned_benchmark']}, we show self-consistency metrics and compare our IgDiff model against RFDiffusion on each design task. None of the light chains designed by RFDiffusion can be parsed by Anarci anarci after inverse folding, while IgDiff always generates valid light chains, of which 93.3% pass our confidence and scRMSD test. The CDRH3 length change task leads to 74% of IgDiff generated structures that pass our combined test, compared with only 6% of RFDiffusion designs. For the task of designing all CDR loops, none of the RFDiffusion structures pass the scRMSD test, while 4% of the IgDiff structures have scRMSD below 2Å across all regions. Note here that this low success rate in the case of IgDiff is primarily driven by a high scRMSD in the CDRH1 loop, while the remaining regions have a success rate above 75%. In contrast RFDiffusion also has poor self-consistency in CDRH1, CDRH3, CDRL1, and CDRL3. Further analysis of the scRMSD distribution and canonical clusters of designs, as well as details of the starting point antibodies used for each design task, are given in Appendix \ref{['app:conditioned']}. IgDiffRFDiffusionTaskscRMSDConfidenceCombinedscRMSDConfidenceCombinedUnconditioned0.880.790.76---Design all CDRs0.040.800.020.000.540.00Change CDRH3 length0.760.910.740.080.690.06Design light chain0.9310.93---Success rate of inpainted samples for each design task and three different tests, comparing IgDiff and RFDiffusion. The scRMSD test requires all regions to have scRMSD < 2Å independently. The confidence test requires the ABB2 RMSPE averaged over all residues to be less than the 90th percentile of the same metric evaluated on the ABB2 test set. The combined test metric reports the fraction of designs passing both previous tests.In this article, we introduce a model for de novo antibody generation, IgDiff. This model is derived from the recent SE(3) diffusion framework FrameDiff, by fine-tuning on antibody variable domains. The weights of our IgDiff model are made publicly available zenodo. We show that our antibody backbone model is able to recapitulate the expected backbone dihedral distribution, and studied the validity of the sequences recovered from generated samples using an antibody-specific inverse folding model. Studying the designability of the generated structures by comparing them with structure predictions based on the corresponding sequences, we found excellent agreement. We probed our model for novelty by finding the closest match in the training data for each sampled structure and found it could generate structures distinct from those in the training set. We selected a number of designs for experimental validation, finding that all generated antibodies express with high yield. We further found that IgDiff generates diverse antibodies, particularly in the CDR loops that are important determinants of binding affinity, making it well suited for antibody design tasks. Finally, we considered examples of antibody engineering tasks, such as the redesign of CDR loops or the generation of a light chain paired to a specified heavy chain, and demonstrated the applicability of our approach in practical use cases. We designed metrics to assess the quality of generated structures, and showed substantial improvement over existing state-of-the-art protein diffusion models. Diffusion models trained on antibodies offer a promising approach to accelerate drug design through data-driven generative AI. In this article, we provide a key step towards the generation of viable therapeutic antibodies through structure-based diffusion. Conditioning the generation of samples to express desired properties, as well as to target specified antigens, will be a crucial elements towards facilitating their application in therapeutic development.We are grateful to Henry Kenlay, Claire Marks, Douglas Pires, Aleksandr Kovaltsuk and Newton Wahome for useful discussions.In Figure \ref{['fig:yield']}, we show the expression yields for 28 selected antibody sequences predicted with AbMPNN from full unconditioned variable structures produced with IgDiff. Expression yields (mg/L) of a validation set of IgDiff generated antibodies.We assess the scRMSD of IgDiff generated antibody structures to corresponding ABB2 predicted structures. To this end, we predict 20 amino acid sequences for each IgDiff generated backbone structure using AbMPNN and use ABB2 to predict the structure those sequences are likely to assume, reporting the scRMSDs for the AbMPNN prediction that achieves the lowest overall scRMSD across the 20 predictions for each IgDiff output, as shown in Fig. \ref{['fig:scrmsd']}. A lower scRMSD in this scenario indicates that the IgDiff generated antibody structures represent realistic and designable antibody structures. We show that the structures are designable in that the scRMSD in all loops are typically lower than the 2 Å cutoff used to define designability in the original FrameDiff paper framediff. All IgDiff generated antibodies pass the 2Å threshold across the entire antibody and 88% of IgDiff generated antibodies also pass for every sub-region assessed independently in Fig. \ref{['fig:scrmsd']}, including all CDR loops. scRMSD of IgDiff structures against ABB2 re-predictions, compared to inherent ABB2 RMSD. (Left)IgDiff scRMSD to the ABB2 predicted structure for the AbMPNN sequence producing the lowest overall scRMSD to the input IgDiff structure. The red line indicates the 2Å cutoff used to designate a designable structure in FrameDiffframediff. (Right) RMSD of ABB2 predictions on the ABB2 testset against the ground truth crystal structures. Mean IgDiff scRMSD on each region are lower than the corresponding ABB2 test set RMSD.We further assessed self-consistency through the canonical clusters assumed by the non-H3 CDR loops in both IgDiff generated structures and ABB2 repredicted structures (generated as described for the scRMSD calculation). To this end, each loop is assigned to the closest PyIGClassify cluster centre by RMSD adolf2015pyigclassify. If a loop has an unusual length with no clusters or has $\text{RMSD} > 1.5$Å to the closest cluster centre then it is denoted as unclassified. For non-H3 loops, IgDiff structures and ABB2 repredicted structures were assigned the same canonical class in between 93% (CDR L1) and 98% (CDR H1) of samples for the less variable CDRs 1 and 2 and in 85% of samples for the more variable CDR L3. Up to 8% (CDRH2) of generated designs fell outside of the known canonical class for that loop both in the IgDiff generated structure and in the ABB2 repredicted structure (see Table \ref{['tab:unconditional_clusters']}). We further show the RMSD to the assigned cluster centre for each loop generated by IgDiff and ABB2 respectively, as well as for a baseline of OAS paired ABB2 predicted structures (see Fig. \ref{['fig:cluster-rmsd']}). We demonstrate that RMSD to the closest cluster centre depends heavily on the canonical class of the loop and that IgDiff generated antibodies follow a similar distribution of distances to the closest cluster centre as the ABB2 repredicted structures and a baseline set of OAS derived, ABB2 predicted structures. RegionMatchingclustersMismatchingclustersBothunclassifiedCDRH10.980.010.01CDRH20.920.000.08CDRL10.930.030.04CDRL20.950.010.04CDRL30.850.090.06Canonical cluster analysis of antibody structures generated unconditionally using IgDiff and the ABB2 repredicted structures.RMSD to the central cluster representative in the PyIGClassify dataset for all classified canonical loops. Each panel refers to a different CDR loop. For each panel we show the result for the generated IgDiff structures on the left, the results for the ABB2 structures predicted on the IgDiff sequences in the centre, and a baseline of paired OAS structure predictions on the right. In each plot the individual datapoints that make up a box-plot are shown as a swarm-plot with cluster class indicated by the point hue.For heavy or light chains generated with the same length specification, we investigate the distribution of loop length for each CDR, as CDRs with different lengths can necessarily be considered diverse. This is shown in Fig. \ref{['fig:generated-cdr-lengths']}. We observe that IgDiff generates diverse CDR loop lengths largely independent of the pre-specified chain length. For the light chain, diversity of CDR loop lengths increases with increasing chain length, while for the heavy chain, the diversity of generated CDR loops remains broadly constant across chain lengths (with the exception of CDR H1 at short chain lengths). We further observe that IgDiff drives chain length of the heavy chain primarily via CDR H3 length and light chain length via CDR L1 and L3 length, mirroring the natural distributions of CDR loop lengths. Distribution of CDR loop lengths depending on the generated heavy or light chain length.To explore sequence diversity, we find the minimum pairwise Levenshtein distance (also known as edit distance) between the sequence of a generated IgDiff structure and the sequences in the rest of the generated structures. The resulting distribution of pairwise distances is shown in Fig. \ref{['fig:min-edit']}. Each generated antibody has at least 3 edits between the closest sequence, and at most 56 edits. The median number of edits to the closest sequence is 15. For a comparison we also show the minimum pairwise edit distances between 800 example predicted structures taken from the paired OAS dataset with the same chain lengths as the IgDiff unconditioned dataset. Both distributions are similar, with IgDiff producing fewer antibodies with very low and very high minimum edit distances and more antibodies with medium edit distances than the OAS baseline. Histogram of the minimum pairwise Levenshtein distance between 800 unconditioned IgDiff generated structures (blue) and 800 paired OAS sequences (orange).The starting point antibodies used in each design task are given in Table \ref{['tab:design_tasks']}. In this appendix, we consider also the task of redesigning the CDRL3 loop with varying lengths, to study the recapitulation of the canonical clusters of designs with modified loop lengths. This task is left out of the main text for conciseness but achieves 100% success on the scRMSD test for IgDiff compared to 85% for RFDiffusion, with both models obtaining 100% success on the confidence test. Design taskSubtasksReference antibodyTotal samplesCDRH3 length changeH3 lengths 10-197seg100CDRL3 length changeL3 lengths 8-117sem40Design all CDRsDifferent antibodies7ps6, 7q4q, 7rp2, 7ttm, 7u8c50Design light chainDifferent heavy chains7qf0, 7rxl, 7zf630Conditional design tasks. We sample 10 structures for each inpainting subtask. The CDRL3 length change design task is limited to this Appendix.In Figure \ref{['fig:scRMSD-conditional']}, we show the scRMSD of each region across the different design tasks. We compare the performance of IgDiff to the baseline of RFDiffusion. We see that in all design tasks IgDiff has a lower mean and median scRMSD in the region that is being inpainted. The most challenging task for IgDiff is to design all of the CDRs. Interestingly, the region that IgDiff most struggles to model in this conditional task appears to be CDRH1, although this region was well modelled in unconditional generation. For all other tasks, we show that almost all designed antibodies retain a scRMSD below 2Å across all regions, indicating that the conditioned antibodies retain the designability properties of the unconditioned model. Figure \ref{['fig:fixed-region-rmsd']} shows the RMSD with the input structure across the fixed regions of the design task. This is below 1Å for all generated structures, demonstrating that the motifs are well preserved during the inpainting procedure. scRMSD of antibodies generated conditionally using IgDiff and RFDiffusion with different design tasks. The red line indicates the 2Å cutoff used to designate designable structures.RMSD of the fixed regions during conditional design tasks using IgDiff and RFDiffusion. For each design task, the RMSD is evaluated on the fixed region and between the generated design and the reference antibodyIn Table \ref{['tab:conditional_clusters']}, we give the fraction of matching canonical clusters with repredicted ABB2 structures for each design task. Here we note good agreement across all CDR loops except CDR H1. Notably IgDiff outperforms RFDiffusion on the fraction of matching clusters on all tasks. IgDiffRFDiffusionTaskRegionMatchingclustersMismatchingclustersBothunclassifiedMatchingclustersMismatchingclustersBothunclassifiedDesign all CDRsCDRH10.020.960.020.060.840.10Design all CDRsCDRH21.000.000.001.000.000.00Design all CDRsCDRL10.700.240.060.020.740.24Design all CDRsCDRL20.980.020.000.020.980.00Design all CDRsCDRL30.900.100.000.460.540.00CDRL3 length changeCDRL30.700.280.030.680.330.00Design light chainCDRL10.800.170.03------Design light chainCDRL21.000.000.00------Design light chainCDRL30.870.070.07------Fraction of matching and mismatching canonical clusters between the antibody structures generated conditionally using IgDiff and the ABB2 repredicted structures. Note that RFDiffusion did not produce antibody light chains in the light chain design task.@misc{diffpack, title={DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing}, author={Yangtian Zhan and Zuobai Zhang and Bozitao Zhong and Sanchit Misra and Jian Tang}, year={2023}, eprint={2306.01794}, archiveprefix={arXiv}, primaryclass={q-bio.QM} }@inproceedings{hsrn, title={Antibody-Antigen Docking and Design via Hierarchical Structure Refinement}, author={Jin, Wengong and Barzilay, Dr.Regina and Jaakkola, Tommi}, booktitle={Proceedings of the 39th International Conference on Machine Learning}, pages={10217--10227}, year={2022}, editor={Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume={162}, series={Proceedings of Machine Learning Research}, month={17--23 Jul}, publisher={PMLR}, pdf={https://proceedings.mlr.press/v162/jin22a/jin22a.pdf}, url={https://proceedings.mlr.press/v162/jin22a.html} }@inproceedings{se3transformers, title={SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks}, author={Fabian B. Fuchs and Daniel E. Worrall and Volker Fischer and Max Welling}, year={2020}, booktitle={Advances in Neural Information Processing Systems 34 (NeurIPS)} }@misc{dymean, title={End-to-End Full-Atom Antibody Design}, author={Xiangzhe Kong and Wenbing Huang and Yang Liu}, year={2023}, eprint={2302.00203}, archiveprefix={arXiv}, primaryclass={q-bio.BM} }@inproceedings{diffusion, title={Deep Unsupervised Learning using Nonequilibrium Thermodynamics}, author={Sohl-Dickstein, Jascha and Weiss, Eric and Maheswaranathan, Niru and Ganguli, Surya}, booktitle={Proceedings of the 32nd International Conference on Machine Learning}, pages={2256--2265}, year={2015}, editor={Bach, Francis and Blei, David}, volume={37}, series={Proceedings of Machine Learning Research}, address={Lille, France}, month={07--09 Jul}, publisher={PMLR}, pdf={http://proceedings.mlr.press/v37/sohl-dickstein15.pdf}, url={https://proceedings.mlr.press/v37/sohl-dickstein15.html} }@article{lstmab, author={Saka, Koichiro and Kakuzaki, Taro and Metsugi, Shoichi and Kashiwagi, Daiki and Yoshida, Kenji and Wada, Manabu and Tsunoda, Hiroyuki and Teramoto, Reiji}, date={2021/03/12}, added={2023-06-13 09:58:06 +0100}, modified={2023-06-13 09:58:06 +0100}, doi={10.1038/s41598-021-85274-7}, id={Saka2021}, isbn={2045-2322}, journal={Scientific Reports}, number={1}, pages={5852}, title={Antibody design using LSTM based deep generative model from phage display library for affinity maturation}, url={https://doi.org/10.1038/s41598-021-85274-7}, volume={11}, year={2021}, 1={https://doi.org/10.1038/s41598-021-85274-7} }@article{10.1093/bioinformatics/btz895, author={Liu, Ge and Zeng, Haoyang and Mueller, Jonas and Carter, Brandon and Wang, Ziheng and Schilz, Jonas and Horny, Geraldine and Birnbaum, Michael E and Ewert, Stefan and Gifford, David K}, title={Antibody complementarity determining region design using high-capacity machine learning}, journal={Bioinformatics}, volume={36}, number={7}, pages={2126-2133}, year={2019}, month={11}, issn={1367-4803}, doi={10.1093/bioinformatics/btz895}, url={https://doi.org/10.1093/bioinformatics/btz895}, eprint={https://academic.oup.com/bioinformatics/article-pdf/36/7/2126/33027680/btz895.pdf} }@inproceedings{mean, title={Conditional Antibody Design as 3D Equivariant Graph Translation}, author={Xiangzhe Kong and Wenbing Huang and Yang Liu}, booktitle={The Eleventh International Conference on Learning Representations}, year={2023}, url={https://openreview.net/forum?id=LFHFQbjxIiP} }@inproceedings{diffab, title={Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures}, author={Shitong Luo and Yufeng Su and Xingang Peng and Sheng Wang and Jian Peng and Jianzhu Ma}, booktitle={Advances in Neural Information Processing Systems}, editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho}, year={2022}, url={https://openreview.net/forum?id=jSorGn2Tjg} }@inproceedings{refinegnn, title={Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design}, author={Wengong Jin and Jeremy Wohlwend and Regina Barzilay and Tommi S. Jaakkola}, booktitle={International Conference on Learning Representations}, year={2022}, url={https://openreview.net/forum?id=LI2bhrE_2A} }@article{10.1371/journal.pcbi.1006112, doi={10.1371/journal.pcbi.1006112}, author={Adolf-Bryfogle, Jared AND Kalyuzhniy, Oleks AND Kubitz, Michael AND Weitzner, Brian D. AND Hu, Xiaozhen AND Adachi, Yumiko AND Schief, William R. AND Dunbrack, Jr., Roland L.}, journal={PLOS Computational Biology}, publisher={Public Library of Science}, title={RosettaAntibodyDesign (RAbD): A general framework for computational antibody design}, year={2018}, month={04}, volume={14}, url={https://doi.org/10.1371/journal.pcbi.1006112}, pages={1-38}, number={4} }@article{chothia, title={Canonical structures for the hypervariable regions of immunoglobulins}, journal={Journal of Molecular Biology}, volume={196}, number={4}, pages={901-917}, year={1987}, issn={0022-2836}, doi={https://doi.org/10.1016/0022-2836(87)90412-8}, url={https://www.sciencedirect.com/science/article/pii/0022283687904128}, author={Cyrus Chothia and Arthur M. Lesk} }@article{10.1371/journal.pone.0105954, doi={10.1371/journal.pone.0105954}, author={Li, Tong AND Pantazes, Robert J. AND Maranas, Costas D.}, journal={PLOS ONE}, publisher={Public Library of Science}, title={OptMAVEn – A New Framework for the de novo Design of Antibody Variable Region Models Targeting Specific Antigen Epitopes}, year={2014}, month={08}, volume={9}, url={https://doi.org/10.1371/journal.pone.0105954}, pages={1-17}, number={8} }@article{tmscore, title={Scoring function for automated assessment of protein structure template quality}, author={Zhang, Yang and Skolnick, Jeffrey}, doi={10.1002/prot.20264}, number={4}, volume={57}, month={December}, year={2004}, journal={Proteins}, issn={0887-3585}, pages={702—710}, url={https://doi.org/10.1002/prot.20264} }@inproceedings{smcdiff, title={Diffusion Probabilistic Modeling of Protein Backbones in 3D for the motif-scaffolding problem}, author={Brian L. Trippe and Jason Yim and Doug Tischer and David Baker and Tamara Broderick and Regina Barzilay and Tommi S. Jaakkola}, booktitle={The Eleventh International Conference on Learning Representations}, year={2023}, url={https://openreview.net/forum?id=6TxBxqNME1Y} }@misc{lambo2, title={Protein Design with Guided Discrete Diffusion}, author={Nate Gruver and Samuel Stanton and Nathan C. Frey and Tim G. J. Rudner and Isidro Hotzel and Julien Lafrance-Vanasse and Arvind Rajpal and Kyunghyun Cho and Andrew Gordon Wilson}, year={2023}, eprint={2305.20009}, archiveprefix={arXiv}, primaryclass={cs.LG} }@article{proteindesign, author={Sidney Lyayuga Lisanza and Jake Merle Gershon and Sam Tipps and Lucas Arnoldt and Samuel Hendel and Jeremiah Nelson Sims and Xinting Li and David Baker}, title={Joint Generation of Protein Sequence and Structure with RoseTTAFold Sequence Space Diffusion}, id={2023.05.08.539766}, year={2023}, doi={10.1101/2023.05.08.539766}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2023/05/10/2023.05.08.539766}, eprint={https://www.biorxiv.org/content/early/2023/05/10/2023.05.08.539766.full.pdf}, journal={bioRxiv} }@article{framedipt, author={Cheng Zhang and Adam Leach and Thomas Makkink and Miguel Arbesú and Ibtissem Kadri and Daniel Luo and Liron Mizrahi and Sabrine Krichen and Maren Lang and Andrey Tovchigrechko and Nicolas Lopez Carranza and Uğur Şahin and Karim Beguir and Michael Rooney and Yunguan Fu}, title={FrameDiPT: SE(3) Diffusion Model for Protein Structure Inpainting}, id={2023.11.21.568057}, year={2023}, doi={10.1101/2023.11.21.568057}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2023/11/21/2023.11.21.568057}, eprint={https://www.biorxiv.org/content/early/2023/11/21/2023.11.21.568057.full.pdf}, journal={bioRxiv} }@article{engreview, title={Antibody structure and function: the basis for engineering therapeutics}, author={Chiu, Mark L and Goulet, Dennis R and Teplyakov, Alexey and Gilliland, Gary L}, journal={Antibodies}, volume={8}, number={4}, pages={55}, year={2019}, publisher={MDPI} }@article{cdrh3_2011, author={Narciso, Jo Erika and Uy, Iris and Cabang, April and Chavez, Jenina and Pablo, Juan and Padilla-Concepcion, Gisela and Padlan, Eduardo}, year={2011}, month={04}, pages={435-47}, title={Analysis of the antibody structure based on high-resolution crystallographic studies}, volume={28}, journal={New biotechnology}, doi={10.1016/j.nbt.2011.03.012} }@article{cdrh3, author={Tsuchiya, Yuko and Mizuguchi, Kenji}, year={2016}, month={01}, pages={}, title={The diversity of H3 loops determines the antigen-binding tendencies of antibody CDR loops}, volume={25}, journal={Protein science : a publication of the Protein Society}, doi={10.1002/pro.2874} }@article{af2, author={Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and Žıdek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A. A. and Ballard, Andrew J. and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W. and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis}, date={2021/08/01}, added={2023-05-05 11:03:15 +0200}, modified={2023-05-05 11:03:15 +0200}, doi={10.1038/s41586-021-03819-2}, id={Jumper2021}, isbn={1476-4687}, journal={Nature}, number={7873}, pages={583--589}, title={Highly accurate protein structure prediction with AlphaFold}, url={https://doi.org/10.1038/s41586-021-03819-2}, volume={596}, year={2021}, 1={https://doi.org/10.1038/s41586-021-03819-2} }@misc{riemannian, title={Riemannian Score-Based Generative Modelling}, author={Valentin De Bortoli and Emile Mathieu and Michael Hutchinson and James Thornton and Yee Whye Teh and Arnaud Doucet}, year={2022}, eprint={2202.02763}, archiveprefix={arXiv}, primaryclass={cs.LG} }@misc{ddpm, title={Denoising Diffusion Probabilistic Models}, author={Jonathan Ho and Ajay Jain and Pieter Abbeel}, year={2020}, eprint={2006.11239}, archiveprefix={arXiv}, primaryclass={cs.LG} }@misc{adam, title={Adam: A Method for Stochastic Optimization}, author={Diederik P. Kingma and Jimmy Ba}, year={2017}, eprint={1412.6980}, archiveprefix={arXiv}, primaryclass={cs.LG} }@misc{dsm, title={Score-Based Generative Modeling through Stochastic Differential Equations}, author={Yang Song and Jascha Sohl-Dickstein and Diederik P. Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole}, year={2021}, eprint={2011.13456}, archiveprefix={arXiv}, primaryclass={cs.LG} }@article{sequencegen, title={Protein sequence design with deep generative models}, journal={Current Opinion in Chemical Biology}, volume={65}, pages={18-27}, year={2021}, note={Mechanistic Biology * Machine Learning in Chemical Biology}, issn={1367-5931}, doi={https://doi.org/10.1016/j.cbpa.2021.04.004}, url={https://www.sciencedirect.com/science/article/pii/S136759312100051X}, author={Zachary Wu and Kadina E. Johnston and Frances H. Arnold and Kevin K. Yang}, keywords={Deep learning, Generative models, Protein engineering} }@misc{abdiffuser, title={AbDiffuser: Full-Atom Generation of In-Vitro Functioning Antibodies}, author={Karolis Martinkus and Jan Ludwiczak and Kyunghyun Cho and Wei-Ching Liang and Julien Lafrance-Vanasse and Isidro Hotzel and Arvind Rajpal and Yan Wu and Richard Bonneau and Vladimir Gligorijevic and Andreas Loukas}, year={2023}, eprint={2308.05027}, archiveprefix={arXiv}, primaryclass={q-bio.BM} }@article{protpardelle, author={Alexander E. Chu and Lucy Cheng and Gina El Nesr and Minkai Xu and Po-Ssu Huang}, title={An all-atom protein generative model}, id={2023.05.24.542194}, year={2023}, doi={10.1101/2023.05.24.542194}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2023/05/25/2023.05.24.542194}, eprint={https://www.biorxiv.org/content/early/2023/05/25/2023.05.24.542194.full.pdf}, journal={bioRxiv} }@article{grw, author={Jørgensen, Erik}, date={1975/03/01}, added={2023-09-08 14:13:10 +0100}, modified={2023-09-08 14:13:10 +0100}, doi={10.1007/BF00533088}, id={JØrgensen1975}, isbn={1432-2064}, journal={Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete}, number={1}, pages={1--64}, title={The central limit problem for geodesic random walks}, url={https://doi.org/10.1007/BF00533088}, volume={32}, year={1975}, 1={https://doi.org/10.1007/BF00533088} }@misc{transformer, title={Attention Is All You Need}, author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin}, year={2017}, eprint={1706.03762}, archiveprefix={arXiv}, primaryclass={cs.CL} }@inbook{bondangles, author={Engh, R. A. and Huber, R.}, publisher={John Wiley & Sons, Ltd}, isbn={9780470685754}, title={Structure quality and target parameters}, booktitle={International Tables for Crystallography, Set, Volumes A - G, OnlineMRW}, chapter={18.3}, pages={474-484}, doi={https://doi.org/10.1107/97809553602060000857}, url={https://onlinelibrary.wiley.com/doi/abs/10.1107/97809553602060000857}, eprint={https://onlinelibrary.wiley.com/doi/pdf/10.1107/97809553602060000857}, year={2012}, keywords={bias, bond-angle restraints, bond-length restraints, cross validation, force constants, nonbonded interactions, outliers, planarity restraints, refinement, restraints, target parameters, torsion-angle restraints} }@article{rfdiffusion, author={Joseph L. Watson and David Juergens and Nathaniel R. Bennett and Brian L. Trippe and Jason Yim and Helen E. Eisenach and Woody Ahern and Andrew J. Borst and Robert J. Ragotte and Lukas F. Milles and Basile I. M. Wicky and Nikita Hanikel and Samuel J. Pellock and Alexis Courbet and William Sheffler and Jue Wang and Preetham Venkatesh and Isaac Sappington and Susana Vázquez Torres and Anna Lauko and Valentin De Bortoli and Emile Mathieu and Regina Barzilay and Tommi S. Jaakkola and Frank DiMaio and Minkyung Baek and David Baker}, title={Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models}, id={2022.12.09.519842}, year={2022}, doi={10.1101/2022.12.09.519842}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2022/12/10/2022.12.09.519842}, eprint={https://www.biorxiv.org/content/early/2022/12/10/2022.12.09.519842.full.pdf}, journal={bioRxiv} }@article{rosetta, title={The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design}, author={Alford, Rebecca F. and Andrew Leaver-Fay and Jeliazkov, Jeliazko R. and O'Meara, Matthew J. and DiMaio, Frank P. and Hahnbeom Park and Shapovalov, Maxim V. and Renfrew, P. Douglas and Mulligan, Vikram K. and Kalli Kappel and Labonte, Jason W. and Pacella, Michael S. and Richard Bonneau and Philip Bradley and Dunbrack, Roland L. and Rhiju Das and David Baker and Brian Kuhlman and Tanja Kortemme and Gray, Jeffrey J.}, year={2017}, day={13}, doi={10.1021/acs.jctc.7b00125}, language={English (US)}, volume={13}, pages={3031--3048}, journal={Journal of Chemical Theory and Computation}, issn={1549-9618}, publisher={American Chemical Society}, number={6} }@article{afm, author={Richard Evans and Michael O’Neill and Alexander Pritzel and Natasha Antropova and Andrew Senior and Tim Green and Augustin Žıdek and Russ Bates and Sam Blackwell and Jason Yim and Olaf Ronneberger and Sebastian Bodenstein and Michal Zielinski and Alex Bridgland and Anna Potapenko and Andrew Cowie and Kathryn Tunyasuvunakool and Rishub Jain and Ellen Clancy and Pushmeet Kohli and John Jumper and Demis Hassabis}, title={Protein complex prediction with AlphaFold-Multimer}, id={2021.10.04.463034}, year={2022}, doi={10.1101/2021.10.04.463034}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2022/03/10/2021.10.04.463034}, eprint={https://www.biorxiv.org/content/early/2022/03/10/2021.10.04.463034.full.pdf}, journal={bioRxiv} }@article{abb2, author={Brennan Abanades and Wing Ki Wong and Fergus Boyles and Guy Georges and Alexander Bujotzek and Charlotte M. Deane}, title={ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins}, id={2022.11.04.514231}, year={2022}, doi={10.1101/2022.11.04.514231}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2022/11/04/2022.11.04.514231}, eprint={https://www.biorxiv.org/content/early/2022/11/04/2022.11.04.514231.full.pdf}, journal={bioRxiv} }@article{cdhit, author={Fu, Limin and Niu, Beifang and Zhu, Zhengwei and Wu, Sitao and Li, Weizhong}, title={CD-HIT: accelerated for clustering the next-generation sequencing data}, journal={Bioinformatics}, volume={28}, number={23}, pages={3150-3152}, year={2012}, month={10}, issn={1367-4803}, doi={10.1093/bioinformatics/bts565}, url={https://doi.org/10.1093/bioinformatics/bts565}, eprint={https://academic.oup.com/bioinformatics/article-pdf/28/23/3150/18529929/bts565.pdf} }@article{oas1, author={Kovaltsuk, Aleksandr and Leem, Jinwoo and Kelm, Sebastian and Snowden, James and Deane, Charlotte M. and Krawczyk, Konrad}, title={Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires}, journal={The Journal of Immunology}, volume={201}, number={8}, pages={2502-2509}, year={2018}, month={10}, issn={0022-1767}, doi={10.4049/jimmunol.1800708}, url={https://doi.org/10.4049/jimmunol.1800708}, eprint={https://journals.aai.org/jimmunol/article-pdf/201/8/2502/1442088/ji1800708.pdf} }@article{oas2, author={Olsen, Tobias H. and Boyles, Fergus and Deane, Charlotte M.}, title={Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences}, journal={Protein Science}, volume={31}, number={1}, pages={141-146}, keywords={annotated antibody sequences, antibody database, antibody repertoire, antibody sequence, BCR-seq, Observed Antibody Space (OAS)}, doi={https://doi.org/10.1002/pro.4205}, url={https://onlinelibrary.wiley.com/doi/abs/10.1002/pro.4205}, eprint={https://onlinelibrary.wiley.com/doi/pdf/10.1002/pro.4205}, year={2022} }@article{proteinmpnn, author={J. Dauparas and I. Anishchenko and N. Bennett and H. Bai and R. J. Ragotte and L. F. Milles and B. I. M. Wicky and A. Courbet and R. J. de Haas and N. Bethel and P. J. Y. Leung and T. F. Huddy and S. Pellock and D. Tischer and F. Chan and B. Koepnick and H. Nguyen and A. Kang and B. Sankaran and A. K. Bera and N. P. King and D. Baker}, title={Robust deep learning based protein sequence design using ProteinMPNN}, id={2022.06.03.494563}, year={2022}, doi={10.1101/2022.06.03.494563}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2022/06/04/2022.06.03.494563}, eprint={https://www.biorxiv.org/content/early/2022/06/04/2022.06.03.494563.full.pdf}, journal={bioRxiv} }@inproceedings{abmpnn, author={Dreyer, Frédéric A. and Cutting, Daniel and Schneider, Constantin and Kenlay, Henry and Deane, Charlotte M.}, booktitle={2023 ICML Workshop on Computational Biology}, title={Inverse folding for antibody sequence design using deep learning}, url={https://arxiv.org/abs/2310.19513}, year={2023} }@inproceedings{structransfo, author={Ingraham, John and Garg, Vikas and Barzilay, Regina and Jaakkola, Tommi}, booktitle={Advances in Neural Information Processing Systems}, editor={H. Wallach and H. Larochelle and A. Beygelzimer and F. d' Alché-Buc and E. Fox and R. Garnett}, pages={}, publisher={Curran Associates, Inc.}, title={Generative Models for Graph-Based Protein Design}, url={https://proceedings.neurips.cc/paper_files/paper/2019/file/f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf}, volume={32}, year={2019} }@article{greiff, title={A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding}, journal={Cell Reports}, volume={34}, number={11}, pages={108856}, year={2021}, issn={2211-1247}, doi={https://doi.org/10.1016/j.celrep.2021.108856}, url={https://www.sciencedirect.com/science/article/pii/S2211124721001704}, author={Rahmad Akbar and Philippe A. Robert and Milena Pavlović and Jeliazko R. Jeliazkov and Igor Snapkov and Andrei Slabodkin and Cédric R. Weber and Lonneke Scheffer and Enkelejda Miho and Ingrid Hobæk Haff and Dag Trygve Tryslew Haug and Fridtjof Lund-Johansen and Yana Safonova and Geir K. Sandve and Victor Greiff}, keywords={antibody, antigen, paratope, epitope, structure, prediction, deep learning, machine learning} }@article{bakerAb, author={Nathaniel R. Bennett and Joseph L. Watson and Robert J. Ragotte and Andrew J. Borst and Déjenaé L. See and Connor Weidle and Riti Biswas and Ellen L. Shrock and Philip J. Y. Leung and Buwei Huang and Inna Goreshnik and Russell Ault and Kenneth D. Carr and Benedikt Singer and Cameron Criswell and Dionne Vafeados and Mariana Garcia Sanchez and Ho Min Kim and Susana Vázquez Torres and Sidney Chan and David Baker}, title={Atomically accurate de novo design of single-domain antibodies}, id={2024.03.14.585103}, year={2024}, doi={10.1101/2024.03.14.585103}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2024/03/18/2024.03.14.585103}, eprint={https://www.biorxiv.org/content/early/2024/03/18/2024.03.14.585103.full.pdf}, journal={bioRxiv} }@article{pairing, abstract={Georgiou and colleagues describe a single-cell, emulsion-based approach for the high-throughput determination of the paired antibody variable heavy and light chain (VH-VL) repertoire encoded by the more than 2 ×106 B cells in human peripheral blood samples.}, author={DeKosky, Brandon J and Kojima, Takaaki and Rodin, Alexa and Charab, Wissam and Ippolito, Gregory C and Ellington, Andrew D and Georgiou, George}, date={2015/01/01}, added={2024-03-15 00:07:27 +0100}, modified={2024-03-15 00:07:27 +0100}, doi={10.1038/nm.3743}, id={DeKosky2015}, isbn={1546-170X}, journal={Nature Medicine}, number={1}, pages={86--91}, title={In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire}, url={https://doi.org/10.1038/nm.3743}, volume={21}, year={2015}, 1={https://doi.org/10.1038/nm.3743} }@article{absci, author={Amir Shanehsazzadeh and Julian Alverio and George Kasun and Simon Levine and Jibran A. Khan and Chelsea Chung and Nicolas Diaz and Breanna K. Luton and Ysis Tarter and Cailen McCloskey and Katherine B. Bateman and Hayley Carter and Dalton Chapman and Rebecca Consbruck and Alec Jaeger and Christa Kohnert and Gaelin Kopec-Belliveau and John M. Sutton and Zheyuan Guo and Gustavo Canales and Kai Ejan and Emily Marsh and Alyssa Ruelos and Rylee Ripley and Brooke Stoddard and Rodante Caguiat and Kyra Chapman and Matthew Saunders and Jared Sharp and Douglas Ganini da Silva and Audree Feltner and Jake Ripley and Megan E. Bryant and Danni Castillo and Joshua Meier and Christian M. Stegmann and Katherine Moran and Christine Lemke and Shaheed Abdulhaqq and Lillian R. Klug and Sharrol Bachas and Absci Corporation}, title={In vitro validated antibody design against multiple therapeutic antigens using generative inverse folding}, id={2023.12.08.570889}, year={2023}, doi={10.1101/2023.12.08.570889}, publisher={Cold Spring Harbor Laboratory}, url={https://www.biorxiv.org/content/early/2023/12/09/2023.12.08.570889}, eprint={https://www.biorxiv.org/content/early/2023/12/09/2023.12.08.570889.full.pdf}, journal={bioRxiv} }@article{imgt, title={IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains}, journal={Developmental & Comparative Immunology}, volume={27}, number={1}, pages={55-77}, year={2003}, issn={0145-305X}, doi={https://doi.org/10.1016/S0145-305X(02)00039-3}, url={https://www.sciencedirect.com/science/article/pii/S0145305X02000393}, author={Marie-Paule Lefranc and Christelle Pommié and Manuel Ruiz and Véronique Giudicelli and Elodie Foulquier and Lisa Truong and Valérie Thouvenin-Contet and Gérard Lefranc}, keywords={IMGT, Immunoglobulin, T cell receptor, Variable domain, Immunoglobulin superfamily, Numbering, 3D structure, Colliers de Perles} }@article{framediff, title={SE(3) diffusion model with application to protein backbone generation}, author={Yim, Jason and Trippe, Brian L and De Bortoli, Valentin and Mathieu, Emile and Doucet, Arnaud and Barzilay, Regina and Jaakkola, Tommi}, journal={arXiv preprint arXiv:2302.02277}, year={2023} }@article{north, title={A New Clustering of Antibody CDR Loop Conformations}, journal={Journal of Molecular Biology}, volume={406}, number={2}, pages={228-256}, year={2011}, issn={0022-2836}, doi={https://doi.org/10.1016/j.jmb.2010.10.030}, url={https://www.sciencedirect.com/science/article/pii/S0022283610011496}, author={Benjamin North and Andreas Lehmann and Roland L. Dunbrack}, keywords={antibody structure, canonical loop conformations, affinity propagation} }@misc{genie, title={Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds}, author={Yeqing Lin and Mohammed AlQuraishi}, year={2023}, eprint={2301.12485}, archiveprefix={arXiv}, primaryclass={q-bio.BM} }@article{anarci, author={Dunbar, James and Deane, Charlotte M.}, title={ANARCI: antigen receptor numbering and receptor classification}, journal={Bioinformatics}, volume={32}, number={2}, pages={298-300}, year={2015}, month={09}, issn={1367-4803}, doi={10.1093/bioinformatics/btv552}, url={https://doi.org/10.1093/bioinformatics/btv552}, eprint={https://academic.oup.com/bioinformatics/article-pdf/32/2/298/49016419/bioinformatics_32_2_298.pdf} }@software{zenodo, author={Dreyer, Frédéric A. and Cutting, Daniel}, title={De novo antibody design with SE(3) diffusion}, publisher={Zenodo}, doi={10.5281/zenodo.11184374}, url={https://doi.org/10.5281/zenodo.11184374} }@article{adolf2015pyigclassify, title={PyIgClassify: a database of antibody CDR structural classifications}, author={Adolf-Bryfogle, Jared and Xu, Qifang and North, Benjamin and Lehmann, Andreas and Dunbrack Jr, Roland L}, journal={Nucleic acids research}, volume={43}, number={D1}, pages={D432--D438}, year={2015}, publisher={Oxford University Press} }