Table of Contents
Fetching ...

Conditionally Site-Independent Neural Evolution of Antibody Sequences

Stephen Zhewen Lu, Aakarsh Vermani, Kohei Sanno, Jiarui Lu, Frederick A Matsen, Milind Jagota, Yun S. Song

TL;DR

Conditionally Site-Independent Neural Evolution of Antibody Sequences (CoSiNE) presents a neural CTMC that learns site-specific rate matrices conditioned on full antibody sequences to capture epistatic effects during affinity maturation. It provides a theoretical first-order approximation to the true sequential mutation process with a quadratic error bound and introduces Gillespie-based sampling, along with a Taylor-series guided variant for targeted design via Guided Gillespie. Empirically, CoSiNE achieves state-of-the-art zero-shot variant effect prediction, accurately models intra- and inter-chain epistasis, and enables guided affinity maturation from naive antibodies, including local CDR optimization under mutation budgets. The work bridges phylogenetic sequence evolution and deep learning, offering a principled framework for antibody design with potential impact on vaccine design and therapeutic development, while acknowledging limitations such as ignoring indels and the need for broader generalization.

Abstract

Common deep learning approaches for antibody engineering focus on modeling the marginal distribution of sequences. By treating sequences as independent samples, however, these methods overlook affinity maturation as a rich and largely untapped source of information about the evolutionary process by which antibodies explore the underlying fitness landscape. In contrast, classical phylogenetic models explicitly represent evolutionary dynamics but lack the expressivity to capture complex epistatic interactions. We bridge this gap with CoSiNE, a continuous-time Markov chain parameterized by a deep neural network. Mathematically, we prove that CoSiNE provides a first-order approximation to the intractable sequential point mutation process, capturing epistatic effects with an error bound that is quadratic in branch length. Empirically, CoSiNE outperforms state-of-the-art language models in zero-shot variant effect prediction by explicitly disentangling selection from context-dependent somatic hypermutation. Finally, we introduce Guided Gillespie, a classifier-guided sampling scheme that steers CoSiNE at inference time, enabling efficient optimization of antibody binding affinity toward specific antigens.

Conditionally Site-Independent Neural Evolution of Antibody Sequences

TL;DR

Conditionally Site-Independent Neural Evolution of Antibody Sequences (CoSiNE) presents a neural CTMC that learns site-specific rate matrices conditioned on full antibody sequences to capture epistatic effects during affinity maturation. It provides a theoretical first-order approximation to the true sequential mutation process with a quadratic error bound and introduces Gillespie-based sampling, along with a Taylor-series guided variant for targeted design via Guided Gillespie. Empirically, CoSiNE achieves state-of-the-art zero-shot variant effect prediction, accurately models intra- and inter-chain epistasis, and enables guided affinity maturation from naive antibodies, including local CDR optimization under mutation budgets. The work bridges phylogenetic sequence evolution and deep learning, offering a principled framework for antibody design with potential impact on vaccine design and therapeutic development, while acknowledging limitations such as ignoring indels and the need for broader generalization.

Abstract

Common deep learning approaches for antibody engineering focus on modeling the marginal distribution of sequences. By treating sequences as independent samples, however, these methods overlook affinity maturation as a rich and largely untapped source of information about the evolutionary process by which antibodies explore the underlying fitness landscape. In contrast, classical phylogenetic models explicitly represent evolutionary dynamics but lack the expressivity to capture complex epistatic interactions. We bridge this gap with CoSiNE, a continuous-time Markov chain parameterized by a deep neural network. Mathematically, we prove that CoSiNE provides a first-order approximation to the intractable sequential point mutation process, capturing epistatic effects with an error bound that is quadratic in branch length. Empirically, CoSiNE outperforms state-of-the-art language models in zero-shot variant effect prediction by explicitly disentangling selection from context-dependent somatic hypermutation. Finally, we introduce Guided Gillespie, a classifier-guided sampling scheme that steers CoSiNE at inference time, enabling efficient optimization of antibody binding affinity toward specific antigens.
Paper Structure (46 sections, 4 theorems, 31 equations, 15 figures, 5 tables, 3 algorithms)

This paper contains 46 sections, 4 theorems, 31 equations, 15 figures, 5 tables, 3 algorithms.

Key Result

Proposition 1

(Proof in apx:prop1-proof) Assume the per-site rate matrices $Q_\theta(x)_\ell$ are parameterized such that for all $x,y$ with Hamming distance $d(x, y) = 1$ and $\ell$ is the unique site where $x$ and $y$ differ. Then, the error between the transition probability vectors is bounded such that where $\lambda=\max_x\{-\mathbf{Q}_{x,x}\}$ is the maximum exit rate of any given state.

Figures (15)

  • Figure 1: Overview of CoSiNE. Given an antibody sequence $x$, the neural network outputs site-specific rate matrices conditioned on the full sequence. Each matrix is evolved for duration $t$ to yield a per-site transition distribution $p(y_l\mid x, t)$. Assuming conditional independence, we take the product of the per-site transition probabilities to yield the full sequence transition probability $p(y\mid x, t)$.
  • Figure 2: Mean per-site likelihood of CoSiNE versus DASM+Thrifty on held out evolutionary transitions from the test set. CoSiNE achieves better model fit, especially on transitions with longer branch lengths ($t\ge 0.25$).
  • Figure 3: Categorical Jacobian for antibody 47D11 from CoV-AbDab. The heatmap displays the sensitivity of the model's output predictions (y-axis) to specific mutations in the input sequence ($x$-axis). Sensitivity is measured as the Frobenius norm of the change in predicted rate matrix, averaged over all possible mutations. Cyan squares denote the CDR regions for both chains.
  • Figure 4: DMS evaluation results for CoSiNE across expression (green) and binding (purple) assays. A solid color indicates the log-likelihood, $\log p_\theta(y\mid x,t)$, and hatching indicates the selection score defined in \ref{['eq:sel_score']}, which utilizes Thrifty likelihoods to separate selection from neutral mutation.
  • Figure 5: Guided Gillespie consistently steers the predicted binding affinity against SARS-CoV-1 of the sampled leaf sequences. We plot the change in predicted binding affinity from the naive root sequence used to start sampling. Known binders from CoV-AbDab are plotted for reference in red.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Lemma 1
  • Proposition 2
  • proof
  • Lemma 2
  • proof