Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

Rahil Mehrizi; Arash Mehrjou; Maryana Alegro; Yi Zhao; Benedetta Carbone; Carl Fishwick; Johanna Vappiani; Jing Bi; Siobhan Sanford; Hakan Keles; Marcus Bantscheff; Cuong Nguyen; Patrick Schwab

Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

Rahil Mehrizi, Arash Mehrjou, Maryana Alegro, Yi Zhao, Benedetta Carbone, Carl Fishwick, Johanna Vappiani, Jing Bi, Siobhan Sanford, Hakan Keles, Marcus Bantscheff, Cuong Nguyen, Patrick Schwab

TL;DR

Image2Omics demonstrates that high-content cellular images contain informative cues that allow reconstruction of bulk transcriptomics and proteomics for a cell population. The method leverages multiple instance learning with a ResNet backbone to map tiled cell patches to omics readouts, trained on a diverse set of CRISPR perturbations in hiPSC-derived macrophages under M1/M2 polarization. Predictability is demonstrated for a notable fraction of transcripts and proteins, with performance varying by subcellular localisation, pathway membership, and expression level, suggesting imaging can serve as a scalable surrogate in targeted contexts. While promising, the approach requires paired training data and careful consideration of context, but offers potential for non-destructive, time-resolved tracking and heterogeneity analysis in cell-state studies and drug discovery.

Abstract

High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially enable the prediction of multi-omics directly from cell imaging data is therefore currently unclear. Here, we address the question of whether it is possible to predict bulk multi-omics measurements directly from cell images using Image2Omics - a deep learning approach that predicts multi-omics in a cell population directly from high-content images of cells stained with multiplexed fluorescent dyes. We perform an experimental evaluation in gene-edited macrophages derived from human induced pluripotent stem cells (hiPSC) under multiple stimulation conditions and demonstrate that Image2Omics achieves significantly better performance in predicting transcriptomics and proteomics measurements directly from cell images than predictions based on the mean observed training set abundance. We observed significant predictability of abundances for 4927 (18.72%; 95% CI: 6.52%, 35.52%) and 3521 (13.38%; 95% CI: 4.10%, 32.21%) transcripts out of 26137 in M1 and M2-stimulated macrophages respectively and for 422 (8.46%; 95% CI: 0.58%, 25.83%) and 697 (13.98%; 95% CI: 2.41%, 32.83%) proteins out of 4986 in M1 and M2-stimulated macrophages respectively. Our results show that some transcript and protein abundances are predictable from cell imaging and that cell imaging may potentially, in some settings and depending on the mechanisms of interest and desired performance threshold, even be a scalable and resource-efficient substitute for multi-omics measurements.

Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

TL;DR

Abstract

Paper Structure (32 sections, 7 equations, 13 figures, 4 tables)

This paper contains 32 sections, 7 equations, 13 figures, 4 tables.

Abstract
Introduction
Results
Image2Omics.
Predictability by gene product.
Predictability by subcellular localisation.
Predictability by pathway membership, number of modes of the measurement distribution, and abundance.
Qualitative analysis of feature importance.
Cell image embedding.
Discussion
Limitations.
Materials and methods
Data Acquisition
Library selection
High-throughput 3’ RNA-seq
...and 17 more sections

Figures (13)

Figure 1: Predicting multi-omics from high content cellular imaging. We generated cell imaging data for a cellular system under a wide range of CRISPR perturbations and exposed to multiple stimuli (top left; in this work: M1- and M2-polarised macrophages). We then trained a machine learning model (Image2Omics; right) using the paired samples where both imaging and multi-omics (transcriptomics and proteomics) were available to learn how to predict the multi-omics layers directly from high-content images alone using independently fine-tuned models for each omics modality and stimulation condition.
Figure 2: Overall prediction performance. Correlation coefficients ($r^2$; y-axis, higher = more predictable) between observed protein and transcript abundances and those predicted by Image2Omics in M1 and M2 polarised states on all held out test set samples. Dots correspond to transcript or protein markers and violins indicate the distribution of coefficient of determinations across the transcriptome and proteome in M1 and M2 states. A number of selected genes from the top and bottom 10 for each stimulus and gene product with the respectively lowest and highest prediction errors are available in \ref{['tb:top_genes_performances_1', 'tb:top_genes_performances_2', 'tb:top_genes_performances_3', 'tb:top_genes_performances_4']}.
Figure 3: Predictability across gene product types and stimuli. Charts displaying the percentage of transcripts (top) and proteins (bottom) in M1 (left column) and M2 (right column) polarised states that are significantly ($p<0.05$) more predictable (SMAPE$_\text{Image2Omics}$$<$ SMAPE$_\text{mean}$) from image data on held-out test data using Image2Omics than using the mean observed abundance in the training set (purple), the percentage of non significant marker proteins and transcripts (light pink) and those filtered out due to low or no observed expression in the experiment (light grey). We found that overall transcriptomics and proteomics were similarly frequently predictable with the exception of the M1 state in which transcriptomics abundances were more frequently predictable than proteomics abundances.
Figure 4: Predictability by subcellular localisation of gene products. Performance in predicting measured transcript (top) and protein (bottom) abundances from image data alone measured in terms of correlation (measured in $r^2$; y-axis, higher is better) on held-out test images of M1 (left column) and M2 (right column) macrophages broken down by subcellular location (x-axis) sorted from least (left) to best (right) predictable on average for the top 8 largest subcellular localisation categories. We found that gene products with subcellular locations in plasma membrane and vesicles are more predictable than other subcellular locations - with some exceptions (e.g. mitochondria being more predictable in the M1 state proteomics measurements). Note that gene products without known subcellular location are not shown here. Performance is calculated over all perturbed and unperturbed cell states to cover a diverse range of cellular states.
Figure 5: Features associated with predictability of gene products. Forest plots indicating the associations (y axis) with a significantly ($p<0.05$) predictability of protein and transcript abundances on held-out test data (x-axis; measured in linear regression beta coefficients; higher = more predictable) in M1 and M2 conditions including pathway membership of the predicted gene (first section from the top; top 5 pathways associated with lower and higher predictability pathways shown), sub-cellular localisation of the gene (second section from the top), and mean abundance levels observed in the training set (bottom-most section). We found that abundances are more predictable on average for more highly expressed protein products, and that membership in a variety of pathways is associated with differences in predictability of abundances.
...and 8 more figures

Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

TL;DR

Abstract

Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (13)