Table of Contents
Fetching ...

IgCraft: A versatile sequence generation framework for antibody discovery and engineering

Matthew Greenig, Haowen Zhao, Vladimir Radenkovic, Aubin Ramon, Pietro Sormanni

TL;DR

IgCraft introduces a unified Bayesian Flow Network framework for generating paired human antibody sequences, enabling unconditional sampling, sequence inpainting, inverse folding, and CDR grafting within a single model. It combines a two-track transformer with a structure encoder and uses staged training to leverage both sequence and structure information, achieving competitive performance across tasks and state-of-the-art results in CDR grafting under structural conditioning. The approach improves developability metrics such as humanness and solubility while preserving functional features, demonstrating practical utility for antibody discovery and engineering. Overall, IgCraft provides a scalable, versatile platform for sampling human antibody sequences across diverse design contexts with flexible conditioning.

Abstract

Designing antibody sequences to better resemble those observed in natural human repertoires is a key challenge in biologics development. We introduce IgCraft: a multi-purpose model for paired human antibody sequence generation, built on Bayesian Flow Networks. IgCraft presents one of the first unified generative modeling frameworks capable of addressing multiple antibody sequence design tasks with a single model, including unconditional sampling, sequence inpainting, inverse folding, and CDR motif scaffolding. Our approach achieves competitive results across the full spectrum of these tasks while constraining generation to the space of human antibody sequences, exhibiting particular strengths in CDR motif scaffolding (grafting) where we achieve state-of-the-art performance in terms of humanness and preservation of structural properties. By integrating previously separate tasks into a single scalable generative model, IgCraft provides a versatile platform for sampling human antibody sequences under a variety of contexts relevant to antibody discovery and engineering. Model code and weights are publicly available at https://github.com/mgreenig/IgCraft.

IgCraft: A versatile sequence generation framework for antibody discovery and engineering

TL;DR

IgCraft introduces a unified Bayesian Flow Network framework for generating paired human antibody sequences, enabling unconditional sampling, sequence inpainting, inverse folding, and CDR grafting within a single model. It combines a two-track transformer with a structure encoder and uses staged training to leverage both sequence and structure information, achieving competitive performance across tasks and state-of-the-art results in CDR grafting under structural conditioning. The approach improves developability metrics such as humanness and solubility while preserving functional features, demonstrating practical utility for antibody discovery and engineering. Overall, IgCraft provides a scalable, versatile platform for sampling human antibody sequences across diverse design contexts with flexible conditioning.

Abstract

Designing antibody sequences to better resemble those observed in natural human repertoires is a key challenge in biologics development. We introduce IgCraft: a multi-purpose model for paired human antibody sequence generation, built on Bayesian Flow Networks. IgCraft presents one of the first unified generative modeling frameworks capable of addressing multiple antibody sequence design tasks with a single model, including unconditional sampling, sequence inpainting, inverse folding, and CDR motif scaffolding. Our approach achieves competitive results across the full spectrum of these tasks while constraining generation to the space of human antibody sequences, exhibiting particular strengths in CDR motif scaffolding (grafting) where we achieve state-of-the-art performance in terms of humanness and preservation of structural properties. By integrating previously separate tasks into a single scalable generative model, IgCraft provides a versatile platform for sampling human antibody sequences under a variety of contexts relevant to antibody discovery and engineering. Model code and weights are publicly available at https://github.com/mgreenig/IgCraft.

Paper Structure

This paper contains 24 sections, 14 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: IgCraft's two-track transformer architecture. Layers are color-coded by the stage of training during which they are updated. The main backbone (blue, green) receives noisy logits for the VH/VL sequences as input and outputs predicted probabilities for the amino acid identity at each position. Shown in the bottom right corner are the minimum/maximum lengths (both sides inclusive) per variable domain region of each antibody chain type. MHA: Multi-head attention; MLP: Multi-layer perceptron; SwiGLU: Swish-gated linear unit.
  • Figure 2: Mean amino acid recovery (AAR) for sequence inpainting of different variable regions on the 2000 holdout sequences from paired OAS. Error bars correspond to the standard error of the estimated mean AAR.
  • Figure 3: Amino acid recovery (AAR) per-heavy chain region on 98 curated human antibody structures from the AbMPNN test set. Error bars correspond to the standard error of the estimated mean AAR.
  • Figure 4: Amino acid recovery (AAR) per-light chain variable region on 98 curated human antibody structures from the AbMPNN test set. Error bars correspond to the standard error of the estimated mean AAR.
  • Figure 5: Illustrative AlphaFold3 predictions for the grafted sequence and ground-truth bound crystal structures for the mouse antibodies used in the CDR grafting experiment. We show the ground-truth target protein as a surface with the predicted humanised VH/VL structure in gold and the ground-truth mouse antibody structure in blue. Three examples are chosen for visualization: the first (left) is a high-quality docked structure prediction with DockQ = 0.87 (PDB ID: 8TFH), the second (middle) is a docked structure with acceptable quality of DockQ = 0.26 (PDB ID: 8TXU), and the final (right) is an incorrectly docked structure with DockQ = 0.05 (PDB ID: 8TVH). We note that the WT mouse antibody for 8TVH (right) was also incorrectly docked by AF3, like most (9/10) of the grafted antibodies which produced DockQ < 0.23.