Table of Contents
Fetching ...

Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

Alan Nawzad Amin, Nate Gruver, Yilun Kuang, Lily Li, Hunter Elliott, Calvin McCarter, Aniruddh Raghu, Peyton Greenside, Andrew Gordon Wilson

TL;DR

Clone-informed Bayesian Optimization (CloneBO) is introduced, a Bayesian optimization procedure that efficiently optimizes antibodies in the lab by teaching a generative model how the immune system optimizes antibodies.

Abstract

To build effective therapeutics, biologists iteratively mutate antibody sequences to improve binding and stability. Proposed mutations can be informed by previous measurements or by learning from large antibody databases to predict only typical antibodies. Unfortunately, the space of typical antibodies is enormous to search, and experiments often fail to find suitable antibodies on a budget. We introduce Clone-informed Bayesian Optimization (CloneBO), a Bayesian optimization procedure that efficiently optimizes antibodies in the lab by teaching a generative model how our immune system optimizes antibodies. Our immune system makes antibodies by iteratively evolving specific portions of their sequences to bind their target strongly and stably, resulting in a set of related, evolving sequences known as a clonal family. We train a large language model, CloneLM, on hundreds of thousands of clonal families and use it to design sequences with mutations that are most likely to optimize an antibody within the human immune system. We propose to guide our designs to fit previous measurements with a twisted sequential Monte Carlo procedure. We show that CloneBO optimizes antibodies substantially more efficiently than previous methods in realistic in silico experiments and designs stronger and more stable binders in in vitro wet lab experiments.

Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

TL;DR

Clone-informed Bayesian Optimization (CloneBO) is introduced, a Bayesian optimization procedure that efficiently optimizes antibodies in the lab by teaching a generative model how the immune system optimizes antibodies.

Abstract

To build effective therapeutics, biologists iteratively mutate antibody sequences to improve binding and stability. Proposed mutations can be informed by previous measurements or by learning from large antibody databases to predict only typical antibodies. Unfortunately, the space of typical antibodies is enormous to search, and experiments often fail to find suitable antibodies on a budget. We introduce Clone-informed Bayesian Optimization (CloneBO), a Bayesian optimization procedure that efficiently optimizes antibodies in the lab by teaching a generative model how our immune system optimizes antibodies. Our immune system makes antibodies by iteratively evolving specific portions of their sequences to bind their target strongly and stably, resulting in a set of related, evolving sequences known as a clonal family. We train a large language model, CloneLM, on hundreds of thousands of clonal families and use it to design sequences with mutations that are most likely to optimize an antibody within the human immune system. We propose to guide our designs to fit previous measurements with a twisted sequential Monte Carlo procedure. We show that CloneBO optimizes antibodies substantially more efficiently than previous methods in realistic in silico experiments and designs stronger and more stable binders in in vitro wet lab experiments.

Paper Structure

This paper contains 55 sections, 4 theorems, 24 equations, 16 figures, 1 table.

Key Result

Proposition 6.1

(Proof in App. app: marginal lik proof.) For some constant $D$, and $R=\sqrt{N}\frac{\mathrm{Std}(Y_{1:N})}{\sigma}\mathrm{Cor}(F_{1:N}, Y_{1:N})$, with $\Phi$ as the Gaussian CDF,

Figures (16)

  • Figure 1: Our immune system introduces mutations (blue) to evolve weak binders of a target into strong binders (green). The result is a set of related sequences that bind the antigen strongly and stably known as a clonal family. We use a model trained on these families, CloneLM, to perform Bayesian optimization in a procedure called CloneBO. We use experimental data to generate a clonal family that might have evolved to bind our antigen and suggest sequences to test in the lab.
  • Figure 2: CloneLM samples plausible clones. We compare sequences in a clonal family to families generated by CloneLM conditional on $X_0$ ("Prompt"). We align sequences to $X_0$ and highlight locations where sequences differ from $X_0$ in blue. The sampled clonal families have variants in similar places, are similarly diverse as the real one, and share similar variants within each family.
  • Figure 3: CloneLM is a prior over fitness functions. (a) For 5 different clonal families, with starting sequences $X_0$, $p_{\mathrm{CloneLM}}(X_{M+1}|X_{1:M}')$ gets close to $p_{\mathrm{CloneLM}}(X_{M_{\mathrm{large}}+1}|X_{1:M_\mathrm{large}})$ in KL. We shade one standard deviation across 10 samples of $X_{1:M_{\mathrm{large}}}, X'_{1:M}$. (b) For 5 different heavy chain clonal families, $p_\mathrm{CloneLM}(X|X_{0:M})$ better predicts sequences in a clonal family when conditioned on more sequences from that same clonal family $X_{0:M}$. We shade one standard deviation across 10 samples of $X_{1:M}$. (c) To sample from our prior $F\mid X_0$ we perform a martingale posterior procedure. (d) We evolve three antibody therapeutics with three mutations from 25 sampled fitness functions. These sequences evolve to look more like human antibodies.
  • Figure 4: We accurately sample functions from the posterior with a twisted SMC procedure. a) To sample from our posterior, we bias our generated sequences to look more like those sequences that were measured in the lab to be good. b) A sample from tSMC better fits the data than an importance sample. We show a line of best fit between $F_{1:N}^M$ (fitness from a clone of $M$ sequences) and $Y_{1:N}$ (measurement) for example clonal families sampled by importance sampling or twisted SMC, with $M=6$, $D=300$ particles for IS, and $D=4$ for twisted SMC. c) We quantify the result from (b) across 10 replicates for various clone sizes $M$.
  • Figure 5: CloneBO efficiently optimizes antibodies in silico. We show the mean and standard deviation of the best acheived value across 10 replicates. (a) CloneBO efficiently optimizes a fitness function. The blue line is CloneBO; the grey are LaMBO-Ab, LaMBO, Sapiens, and Greedy. (b) CloneBO optimizes binding and stability in silico over 100 steps of iterative design (p value is Mann-Whitney). It does significantly better than the next best method for binding (p=0.018 Mann-Whitney) and stability (p=0.006 Mann-Whitney).
  • ...and 11 more figures

Theorems & Definitions (6)

  • Proposition 6.1
  • Proposition 6.2
  • Proposition D.1
  • proof
  • Proposition D.2
  • proof