Table of Contents
Fetching ...

Inference of germinal center evolutionary dynamics via simulation-based deep learning

Duncan K Ralph, Athanasios G Bakis, Jared Galloway, Ashni A Vora, Tatsuya Araki, Gabriel D Victora, Yun S Song, William S DeWitt, Frederick A Matsen

TL;DR

Deep learning and simulation-based inference is used to learn the “affinity-fitness response function” of B cells with higher affinity for their cognate antigen, which is known that B cells with higher affinity will, on average, tend to have more offspring.

Abstract

B cells and the antibodies they produce are vital to health and survival, motivating research on the details of the mutational and evolutionary processes in the germinal centers (GC) from which mature B cells arise. It is known that B cells with higher affinity for their cognate antigen (Ag) will, on average, tend to have more offspring. However the exact form of this relationship between affinity and fecundity, which we call the ``affinity-fitness response function'', is not known. Here we use deep learning and simulation-based inference to learn this function from a unique experiment that replays a particular combination of GC conditions many times. All code is freely available at https://github.com/matsengrp/gcdyn, while datasets and inference results can be found at https://doi.org/10.5281/zenodo.15022130.

Inference of germinal center evolutionary dynamics via simulation-based deep learning

TL;DR

Deep learning and simulation-based inference is used to learn the “affinity-fitness response function” of B cells with higher affinity for their cognate antigen, which is known that B cells with higher affinity will, on average, tend to have more offspring.

Abstract

B cells and the antibodies they produce are vital to health and survival, motivating research on the details of the mutational and evolutionary processes in the germinal centers (GC) from which mature B cells arise. It is known that B cells with higher affinity for their cognate antigen (Ag) will, on average, tend to have more offspring. However the exact form of this relationship between affinity and fecundity, which we call the ``affinity-fitness response function'', is not known. Here we use deep learning and simulation-based inference to learn this function from a unique experiment that replays a particular combination of GC conditions many times. All code is freely available at https://github.com/matsengrp/gcdyn, while datasets and inference results can be found at https://doi.org/10.5281/zenodo.15022130.

Paper Structure

This paper contains 23 sections, 7 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 3: Overview of the workflow for this paper. After specifying a birth-death model with sigmoid affinity-fitness response function, we simulate many trees and their sequences at each node, with parameters roughly consistent with our real data samples. The simulation is cell-based and implements a carrying-capacity population size limit. The results of the simulation are then encoded and used to train a neural network that infers the sigmoid response parameters on real data. In addition to encoded trees, the network also takes as input the assumed values of several non-sigmoid parameters (carrying capacity, initial population size, and death rate). Inference on real data is performed many times, with many different combinations of non-sigmoid parameter values, and additional "data mimic" simulation is generated using each of the resulting inferred parameter value combinations. The summary statistics of each data mimic sample are compared to real data, with the best match selected as the "central data mimic" sample with final parameter values for both sigmoid (inferred with the neural network) and non-sigmoid (inferred by matching summary statistics) parameters. Plots from elsewhere in the manuscript are rendered in schematic form: those in "infer on data" refer to \ref{['fig:dataCurves']}---\ref{['figsupp:dataExampleCurvesSigmoid']} , and those in "simulate with inferred parameters" to \ref{['fig:simuSummaryStatReplayPlotCkdl']}.
  • Figure 4: Training and testing results for the sigmoid model on simulation. We show curve difference loss distributions on several subsets of the training sample (where each GC has different parameter values): training and validation (left) and testing (right). For computational efficiency when plotting, the curve difference distributions display only the first 1000 values. See \ref{['fig:simuTrainCurvesDiffsPerBin']} for per-bin model.
  • Figure 5: Training and testing results for the per-bin model on simulation. See \ref{['fig:simuTrainCurvesDiffsSigmoid']} for details.
  • Figure 6: Inferred response functions on real data for sigmoid (left) and per-bin (right) models, corresponding to the non-sigmoid parameter values yielding simulation with the best-matching summary statistics. The medoid curve is shown in orange with 68% and 95% confidence intervals in blue, with observed affinity values in grey.
  • Figure 7: Summary statistics on data vs simulation for the central data mimic simulation sample that most closely mimics inferred data parameters. Simulation truth (dashed green) is unobservable and shown only for completeness; the important comparison is between purple and green solid lines, where both data and simulation have been run through IQ-TREE.
  • ...and 3 more figures