Table of Contents
Fetching ...

Cell reprogramming design by transfer learning of functional transcriptional networks

Thomas P. Wytock, Adilson E. Motter

TL;DR

A transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates to generate a model of the functional network dynamics that can be transferred to specific reprogramming goals is developed.

Abstract

Recent developments in synthetic biology, next-generation sequencing, and machine learning provide an unprecedented opportunity to rationally design new disease treatments based on measured responses to gene perturbations and drugs to reprogram cells. The main challenges to seizing this opportunity are the incomplete knowledge of the cellular network and the combinatorial explosion of possible interventions, both of which are insurmountable by experiments. To address these challenges, we develop a transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates, thereby generating a model of the network dynamics that can be transferred to specific reprogramming goals. The approach combines transcriptional responses to gene perturbations to minimize the difference between a given pair of initial and target transcriptional states. We demonstrate our approach's versatility by applying it to a microarray dataset comprising >9,000 microarrays across 54 cell types and 227 unique perturbations, and an RNASeq dataset consisting of >10,000 sequencing runs across 36 cell types and 138 perturbations. Our approach reproduces known reprogramming protocols with an AUROC of 0.91 while innovating over existing methods by pre-training an adaptable model that can be tailored to specific reprogramming transitions. We show that the number of gene perturbations required to steer from one fate to another increases with decreasing developmental relatedness and that fewer genes are needed to progress along developmental paths than to regress. These findings establish a proof-of-concept for our approach to computationally design control strategies and provide insights into how gene regulatory networks govern phenotype.

Cell reprogramming design by transfer learning of functional transcriptional networks

TL;DR

A transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates to generate a model of the functional network dynamics that can be transferred to specific reprogramming goals is developed.

Abstract

Recent developments in synthetic biology, next-generation sequencing, and machine learning provide an unprecedented opportunity to rationally design new disease treatments based on measured responses to gene perturbations and drugs to reprogram cells. The main challenges to seizing this opportunity are the incomplete knowledge of the cellular network and the combinatorial explosion of possible interventions, both of which are insurmountable by experiments. To address these challenges, we develop a transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates, thereby generating a model of the network dynamics that can be transferred to specific reprogramming goals. The approach combines transcriptional responses to gene perturbations to minimize the difference between a given pair of initial and target transcriptional states. We demonstrate our approach's versatility by applying it to a microarray dataset comprising >9,000 microarrays across 54 cell types and 227 unique perturbations, and an RNASeq dataset consisting of >10,000 sequencing runs across 36 cell types and 138 perturbations. Our approach reproduces known reprogramming protocols with an AUROC of 0.91 while innovating over existing methods by pre-training an adaptable model that can be tailored to specific reprogramming transitions. We show that the number of gene perturbations required to steer from one fate to another increases with decreasing developmental relatedness and that fewer genes are needed to progress along developmental paths than to regress. These findings establish a proof-of-concept for our approach to computationally design control strategies and provide insights into how gene regulatory networks govern phenotype.
Paper Structure (24 sections, 7 equations, 12 figures, 4 tables)

This paper contains 24 sections, 7 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Schematic overview of the data-driven control approach. (A) Construction of the library of transcriptional responses to gene perturbations in the latent space, which is defined as the subspace of selected eigengenes $\mathcal{F}^*$. The pink and teal arrows indicate the experimentally measured shift in transcription from a mock-treated state to a perturbed state (filled and empty circles, respectively) in different cell types (green and blue colors). (B) Perturbation optimization algorithm, where the goal is to drive the initial state $\mathbf{x}^S$ (orange filled circle, "S" for starting) to the target state $\mathbf{x}^A$ (open purple circle, "A" for attractor), which is the average of the individual states of the target cell type (filled purple circles). This is achieved by linearly combining the transcriptional responses to steer the system to a state (open teal circle) that minimizes the distance to the target. Within the algorithm, perturbation responses are added incrementally until the state is predicted to cross the cell type boundary (marked by the patterned surface) as determined by the KNN model. The order in which the incremental perturbations are selected within the algorithm does not imply a temporal ordering in the implementation of the perturbations.
  • Figure 2: Comparison of annotation-based methods, which do not account for off-target effects, with our control approach, which does. (A) Box-and-whisker plots of the coefficient of determination $(R^2)$ of the perturbation predicted using annotation-based methods over all initial cell types for the unconstrained (red), size-constrained (green), and sign-constrained (blue) constraint scenarios applied to each target cell type. For each method, the left, center, and right of each box represent the 25th, 50th (median), and 75th percentiles of the distribution, respectively; the whiskers mark the minimum and maximum, excluding outliers, which are suppressed for clarity. (B) Results corresponding to those in A for our control approach. (C) Coefficient of determination of the sign of the optimal $u_j$ in each method.
  • Figure 3: Receiver operator characteristic (ROC) curves demonstrating the ability of our approach to reproduce known reprogramming protocols. The ROC curves are constructed by comparing single-perturbation strategies identified by our approach ($\mathcal{Q}$, upper diagonal hatching in the rectangle) ranked in order of their distance to the target (\ref{['eq:opt_dist']}) against 63 experimentally confirmed reprogramming protocols from the literature ($\mathcal{R}$, lower diagonal hatching in the circle). The sizes of $\mathcal{Q}$ and $\mathcal{R}$ and their overlap are characterized by the true positive rate and false positive rate as defined in the vertical and horizontal axis labels, respectively. The color-coded curves and backgrounds correspond to the median and interquartile range for the constraints indicated in the legend, including the median area under the curve (AUC).
  • Figure 4: Possible transdifferentiation transitions as a function of the number of genes perturbed and the fraction of successful transitions. (A) Largest strongly connected component sizes of the networks created when including an edge for each initial-target pair in the RNASeq dataset for which at least a fraction $f$ of the initial states (vertical axis) are transdifferentiated using at most $g$ perturbations (horizontal axis). (B) Corresponding results for the GeneExp dataset. The circled cases are considered further in subsequent figures.
  • Figure 5: Network of transitions (edges) between cell types (nodes) for the parameters indicated by the circle in \ref{['fig:comp_size']}A. The nodes and outgoing edges are color coded by tissue type. The node size increases with the total number of edges (i.e., the sum of incoming and outgoing edges).
  • ...and 7 more figures