Table of Contents
Fetching ...

Estimating the Causal Effects of T Cell Receptors

Eli N. Weinstein, Elizabeth B. Wood, David M. Blei

TL;DR

This work introduces a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data, and develops a scalable neural-network estimator for the identification formula.

Abstract

A central question in human immunology is how a patient's repertoire of T cells impacts disease. Here, we introduce a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data. Our approach corrects for unobserved confounders, such as a patient's environment and life history, by using the patient's immature, pre-selection TCR repertoire. The pre-selection repertoire can be estimated from nonproductive TCR data, which is widely available. It is generated by a randomized mutational process, V(D)J recombination, which provides a natural experiment. We show formally how to use the pre-selection repertoire to draw causal inferences, and develop a scalable neural-network estimator for our identification formula. Our method produces an estimate of the effect of interventions that add a specific TCR sequence to patient repertoires. As a demonstration, we use it to analyze the effects of TCRs on COVID-19 severity, uncovering potentially therapeutic TCRs that are (1) observed in patients, (2) bind SARS-CoV-2 antigens in vitro and (3) have strong positive effects on clinical outcomes.

Estimating the Causal Effects of T Cell Receptors

TL;DR

This work introduces a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data, and develops a scalable neural-network estimator for the identification formula.

Abstract

A central question in human immunology is how a patient's repertoire of T cells impacts disease. Here, we introduce a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data. Our approach corrects for unobserved confounders, such as a patient's environment and life history, by using the patient's immature, pre-selection TCR repertoire. The pre-selection repertoire can be estimated from nonproductive TCR data, which is widely available. It is generated by a randomized mutational process, V(D)J recombination, which provides a natural experiment. We show formally how to use the pre-selection repertoire to draw causal inferences, and develop a scalable neural-network estimator for our identification formula. Our method produces an estimate of the effect of interventions that add a specific TCR sequence to patient repertoires. As a demonstration, we use it to analyze the effects of TCRs on COVID-19 severity, uncovering potentially therapeutic TCRs that are (1) observed in patients, (2) bind SARS-CoV-2 antigens in vitro and (3) have strong positive effects on clinical outcomes.

Paper Structure

This paper contains 68 sections, 3 theorems, 53 equations, 16 figures, 6 tables.

Key Result

Theorem 1

Assume positivity: $\mathrm{p}(q^a = q^a_\star \mid r) > 0$ a.s. for $r \sim \mathrm{p}(r)$, $q^a_\star \sim \sigma_{a_\star,\epsilon}(q^a_\star \,\vert\, r)$. Then, where $\mathrm{p}(q^a, r)$ is derived from $\mathrm{p}(q^a, q^z)$ via eqn:relative-fitness.

Figures (16)

  • Figure 1: Estimating the causal effects of TCRs with CAIRE. (a) CAIRE uses repertoire sequencing and clinical outcomes data from patients. The sequencing data includes nonproductive TCRs. (b) CAIRE trains a neural network to estimate the effect of TCR repertoires on clinical outcomes. It uses pre-selection repertoires as an instrumental variable, to correct for unobserved confounders. The pre-selection repertoire develops into the mature repertoire through a process of antigen-dependent natural selection, in which some TCR populations expand and others die off. Productive TCR data provides information about a patient's current repertoire; nonproductive TCR data provides information about the pre-selection repertoire. (c) CAIRE provides an estimate of the effect of giving T cells with a specific TCR to patients, e.g. via TCR-T cell therapy.
  • Figure 2: Causal graphs. (a) Hierarchical causal model. (b) Collapsed causal model. $u_i$: unobserved confounding. $z_{ik}$: pre-selection repertoire sequences. $a_{ij}$: mature repertoire sequences. $y_i$: patient outcomes. $q^z_i$: pre-selection repertoire distribution. $q^a_i$: repertoire distribution. $r_i$: relative fitness.
  • Figure 3: Distribution of TCR effects in held-out patient repertoires. (a) Effect distribution across all repertoire sequences. (b) Effect distribution across sequences with significant effects ($\tilde{p} < 0.05$)
  • Figure 4: Effect heterogeneity within and between patients (a) The distribution of average effects across repertoires (purple) and the average distribution of effects within repertoires (red). More precisely, each bar of the purple histogram covering interval $\mathcal{I}$ is an estimate of $\mathbb{P}_{Q^a \sim p(q^a)}[\mathbb{E}_{A \sim Q^a}[\textsc{ate}(A;0.1)] \in \mathcal{I}]$. Each bar of the red histogram is an estimate of $\mathbb{E}_{Q^a \sim p(q^a)}[\mathbb{P}_{A \sim Q^a}[ \textsc{ate}(A;0.1) \in \mathcal{I}]]$. (b) Distribution of average effects across repertoires, from patients with different outcomes. Each point at interval $\mathcal{I}$ is an estimate of $\mathbb{P}_{Q^a \sim p(q^a \mid y)}[\mathbb{E}_{A \sim Q^a}[\textsc{ate}(A;0.1)] \in \mathcal{I}]$ for an outcome $y \in \{-1, 0, +1\}$.
  • Figure 5: Distribution of TCR effects across the SARS-CoV-2 genome. x-axis: antigen location in the SARS-CoV-2 genome, indexed by nucleotide. y-axis: estimated causal effect of TCRs that bind that antigen. Each dot represents an individual TCR with a significant effect ($\tilde{p} < 0.05$) that was found to bind an epitope encoded at the given location in the SARS-Cov-2 genome.
  • ...and 11 more figures

Theorems & Definitions (10)

  • Definition 1: Collapsed repertoire IV model
  • Theorem 1: TCR effects are identified
  • Definition S2: Repertoire IV model
  • Definition S3: Intervention by adding a TCR
  • Definition S4: Rewritten repertoire IV model
  • Theorem S1: Repertoire effects are identified
  • proof
  • proof
  • Proposition S1
  • proof