Table of Contents
Fetching ...

Grad-CAMO: Learning Interpretable Single-Cell Morphological Profiles from 3D Cell Painting Images

Vivek Gopalakrishnan, Jingzhe Ma, Zhiyong Xie

TL;DR

This work tackles the interpretability gap in learned single-cell morphological profiles from 3D Cell Painting images. It introduces Grad-CAMO, a simple, model- and data-agnostic score that quantifies how much a learned profile concentrates attention on the cell of interest versus background or neighboring cells, enabling per-cell and dataset-wide audits without changing the imaging workflow. Empirically, the authors show that only a minority of profiles meaningfully align with the target cell, revealing potential confounding factors and neighbor-based shortcuts in supervised feature extraction. Grad-CAMO offers a practical tool for hyperparameter tuning, regularization, and quality control in large-scale single-cell profiling pipelines, with immediate relevance for drug discovery and phenotypic screening.

Abstract

Despite their black-box nature, deep learning models are extensively used in image-based drug discovery to extract feature vectors from single cells in microscopy images. To better understand how these networks perform representation learning, we employ visual explainability techniques (e.g., Grad-CAM). Our analyses reveal several mechanisms by which supervised models cheat, exploiting biologically irrelevant pixels when extracting morphological features from images, such as noise in the background. This raises doubts regarding the fidelity of learned single-cell representations and their relevance when investigating downstream biological questions. To address this misalignment between researcher expectations and machine behavior, we introduce Grad-CAMO, a novel single-cell interpretability score for supervised feature extractors. Grad-CAMO measures the proportion of a model's attention that is concentrated on the cell of interest versus the background. This metric can be assessed per-cell or averaged across a validation set, offering a tool to audit individual features vectors or guide the improved design of deep learning architectures. Importantly, Grad-CAMO seamlessly integrates into existing workflows, requiring no dataset or model modifications, and is compatible with both 2D and 3D Cell Painting data. Additional results are available at https://github.com/eigenvivek/Grad-CAMO.

Grad-CAMO: Learning Interpretable Single-Cell Morphological Profiles from 3D Cell Painting Images

TL;DR

This work tackles the interpretability gap in learned single-cell morphological profiles from 3D Cell Painting images. It introduces Grad-CAMO, a simple, model- and data-agnostic score that quantifies how much a learned profile concentrates attention on the cell of interest versus background or neighboring cells, enabling per-cell and dataset-wide audits without changing the imaging workflow. Empirically, the authors show that only a minority of profiles meaningfully align with the target cell, revealing potential confounding factors and neighbor-based shortcuts in supervised feature extraction. Grad-CAMO offers a practical tool for hyperparameter tuning, regularization, and quality control in large-scale single-cell profiling pipelines, with immediate relevance for drug discovery and phenotypic screening.

Abstract

Despite their black-box nature, deep learning models are extensively used in image-based drug discovery to extract feature vectors from single cells in microscopy images. To better understand how these networks perform representation learning, we employ visual explainability techniques (e.g., Grad-CAM). Our analyses reveal several mechanisms by which supervised models cheat, exploiting biologically irrelevant pixels when extracting morphological features from images, such as noise in the background. This raises doubts regarding the fidelity of learned single-cell representations and their relevance when investigating downstream biological questions. To address this misalignment between researcher expectations and machine behavior, we introduce Grad-CAMO, a novel single-cell interpretability score for supervised feature extractors. Grad-CAMO measures the proportion of a model's attention that is concentrated on the cell of interest versus the background. This metric can be assessed per-cell or averaged across a validation set, offering a tool to audit individual features vectors or guide the improved design of deep learning architectures. Importantly, Grad-CAMO seamlessly integrates into existing workflows, requiring no dataset or model modifications, and is compatible with both 2D and 3D Cell Painting data. Additional results are available at https://github.com/eigenvivek/Grad-CAMO.
Paper Structure (20 sections, 5 equations, 5 figures, 1 table)

This paper contains 20 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of Grad-CAMO. Given a 3D Cell Painting Z-stack, we first use Cellpose stringer2021cellpose to segment individual cells. Using segmentation masks, we create cuboid 3D crops centered on single cells. Extending approaches commonly used for 2D Cell Painting images (e.g. DeepProfiler moshkov2024learning), we train a 3D EfficientNet tan2019efficientnet to predict the treatment label of an individual cell from a crop. During inference, held out cells are passed to the trained network and activations at intermediate layers are used as single-cell feature vectors in $\mathbb R^d$. Visualization of feature vectors using UMAP mcinnes2018umap shows single cells are highly separable based on learned morphological profiles. However, interpretability analysis using Grad-CAM demonstrates that deep learning-based feature extractors do not always pay attention to the cell-of-interest when forming single-cell morphological profiles (on-target vs. off-target). To quantify the fidelity of learned morphological profiles, we introduce Grad-CAMO, a single-cell interpretability metric to quantify the level of confounding in a model's predictions.
  • Figure 2: Automatic segmentations produced by Cellpose. Zoomed-in image patches are shown to demonstrate the accuracy of segmentations produced by Cellpose with minimal hyperparameter tuning. 2D segmentation masks were stitched across the $z$-axis of the 3D Cell Painting images to form 3D segmentation masks.
  • Figure 3: Examples of multi-channel single-cell crops at different dosages. For every cell segmented by Cellpose, a $128 \times 128 \times 21$ crop was extracted from the Z-stack and preprocessed to standardize pixel intensity ranges across samples.
  • Figure 4: (A) UMAP and Grad-CAM. UMAP embedding of features extracted from the single cells in the test set shows that the different treatment labels are highly separable: () Control, () $0.06\times$, () $0.32\times$, () $1.60\times$, () $8.00\times$, and () $40.0\times$. Correction for batch effects is accomplished via a whitening transform. However, cells also cluster by site ($\mathbin{\vcenter{\hbox{$\bullet$}}}$, $\mathbf{\times}$, $\mathbin{\vcenter{\hbox{$\blacksquare$}}}$, $\mathbf{+}$), demonstrating the effect of confounding on learned morphological profiles. (B) Where is the model looking? Using Grad-CAM, we identify three patterns in model attention during deep morphological profiling: concentrating on the central cell, a neighboring cell, or the background. In these localization maps, red denotes higher attention. For visualization purposes, only render the central slice of the Z-stack and overlay associated slice in the Grad-CAM localization map.
  • Figure 5: (A) Example Grad-CAMO scores. Grad-CAMO is calculated as the proportion of the model's Grad-CAM localization map that lies within the segmentation mask of the central cell. (B) Grad-CAMO scores computed over the entire testing set. Distributions are grouped by treatment dose and which site in the well was imaged (i.e. technical replicate): () Site 1, () Site 2, () Site 3, () Site 4.