Table of Contents
Fetching ...

A SSIM Guided cGAN Architecture For Clinically Driven Generative Image Synthesis of Multiplexed Spatial Proteomics Channels

Jillur Rahman Saurav, Mohammad Sadegh Nasr, Paul Koomey, Michael Robben, Manfred Huber, Jon Weidanz, Bríd Ryan, Eytan Ruppin, Peng Jiang, Jacob M. Luber

TL;DR

To address missing channels in multiplexed spatial proteomics, the authors introduce an SSIM-guided conditional GAN that performs image-to-image synthesis to generate photo-accurate protein channels. The model optimizes the objective $\mathcal{L}_{GAN}(G,D) + \lambda \mathcal{L}_{L1}(G)$ and uses SSIM-based channel clustering to selectively train on a minimal yet informative set of input channels, enabling scaling to hundreds of channels with training on HuBMAP data. Key contributions include the SSIM-driven channel selection method, cross-tol tissue validation (HuBMAP and lung adenocarcinoma), and a discussion of clinical deployment and ethics. The work suggests substantial potential to reduce staining costs and increase data throughput in pathology, while underscoring the need for rigorous clinical validation and governance for real-world use.

Abstract

Here we present a structural similarity index measure (SSIM) guided conditional Generative Adversarial Network (cGAN) that generatively performs image-to-image (i2i) synthesis to generate photo-accurate protein channels in multiplexed spatial proteomics images. This approach can be utilized to accurately generate missing spatial proteomics channels that were not included during experimental data collection either at the bench or the clinic. Experimental spatial proteomic data from the Human BioMolecular Atlas Program (HuBMAP) was used to generate spatial representations of missing proteins through a U-Net based image synthesis pipeline. HuBMAP channels were hierarchically clustered by the (SSIM) as a heuristic to obtain the minimal set needed to recapitulate the underlying biology represented by the spatial landscape of proteins. We subsequently prove that our SSIM based architecture allows for scaling of generative image synthesis to slides with up to 100 channels, which is better than current state of the art algorithms which are limited to data with 11 channels. We validate these claims by generating a new experimental spatial proteomics data set from human lung adenocarcinoma tissue sections and show that a model trained on HuBMAP can accurately synthesize channels from our new data set. The ability to recapitulate experimental data from sparsely stained multiplexed histological slides containing spatial proteomic will have tremendous impact on medical diagnostics and drug development, and also raises important questions on the medical ethics of utilizing data produced by generative image synthesis in the clinical setting. The algorithm that we present in this paper will allow researchers and clinicians to save time and costs in proteomics based histological staining while also increasing the amount of data that they can generate through their experiments.

A SSIM Guided cGAN Architecture For Clinically Driven Generative Image Synthesis of Multiplexed Spatial Proteomics Channels

TL;DR

To address missing channels in multiplexed spatial proteomics, the authors introduce an SSIM-guided conditional GAN that performs image-to-image synthesis to generate photo-accurate protein channels. The model optimizes the objective and uses SSIM-based channel clustering to selectively train on a minimal yet informative set of input channels, enabling scaling to hundreds of channels with training on HuBMAP data. Key contributions include the SSIM-driven channel selection method, cross-tol tissue validation (HuBMAP and lung adenocarcinoma), and a discussion of clinical deployment and ethics. The work suggests substantial potential to reduce staining costs and increase data throughput in pathology, while underscoring the need for rigorous clinical validation and governance for real-world use.

Abstract

Here we present a structural similarity index measure (SSIM) guided conditional Generative Adversarial Network (cGAN) that generatively performs image-to-image (i2i) synthesis to generate photo-accurate protein channels in multiplexed spatial proteomics images. This approach can be utilized to accurately generate missing spatial proteomics channels that were not included during experimental data collection either at the bench or the clinic. Experimental spatial proteomic data from the Human BioMolecular Atlas Program (HuBMAP) was used to generate spatial representations of missing proteins through a U-Net based image synthesis pipeline. HuBMAP channels were hierarchically clustered by the (SSIM) as a heuristic to obtain the minimal set needed to recapitulate the underlying biology represented by the spatial landscape of proteins. We subsequently prove that our SSIM based architecture allows for scaling of generative image synthesis to slides with up to 100 channels, which is better than current state of the art algorithms which are limited to data with 11 channels. We validate these claims by generating a new experimental spatial proteomics data set from human lung adenocarcinoma tissue sections and show that a model trained on HuBMAP can accurately synthesize channels from our new data set. The ability to recapitulate experimental data from sparsely stained multiplexed histological slides containing spatial proteomic will have tremendous impact on medical diagnostics and drug development, and also raises important questions on the medical ethics of utilizing data produced by generative image synthesis in the clinical setting. The algorithm that we present in this paper will allow researchers and clinicians to save time and costs in proteomics based histological staining while also increasing the amount of data that they can generate through their experiments.
Paper Structure (7 sections, 9 equations, 5 figures)

This paper contains 7 sections, 9 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of model architecture showing how DAPI is generated. (A) A cGAN model was used to predict missing channels from multiplexed spatioproteomic data generated on the CODEX/phenocycler platform. Training data sets were divided into training and validation data sets based on random sampling. Iterative models were developed off inclusion of multiple marker channels. U-Net is illustrated using visualkeras package (Gavrikov2020VisualKeras) (B) Heuristic clustering by SSIM removed presence bias of marker channels in test image data sets. (C) cGAN accurately predicted DAPI channels in HuBMAP data sets. Generated images were single channeled falsely colored in figure.
  • Figure 2: Real stained protein channels are displayed next to 4 generated single channel predictions. Channels shown in this figure are Podoplanin, PanCK, Vimentin, and SMActin.
  • Figure 3: SSIM based clustering of multiplexed protein channels improves prediction in sparse multichannel protein data sets. (A) Clusters of single channel protein data are represented by different colors on SSIM dendrogram. (B) Loss values are reported for cluster based selection of markers in comparison to the random assignment of markers sampled from all clusters dependent upon percent of total markers sampled from clusters. (C) Correlation accuracy between single protein channels reinforces cluster assignment. Scale of correlation value indicated on right hand side shown in color value between negative one and one.
  • Figure 4: Normalized generator loss for training and test sets when selecting based on SSIM clustering using different number of training channels on the 29-channel HubMAP dataset.
  • Figure 5: Generative images aiming to assist with diagnostic tasks in the clinic taken from a lung adenocarcinoma patient: A) Actual CD8 marker density (yellow) against background (blue), B) The generative version of (A) where the CD8 marker was not included as input, C) CD8 density (red) plotted with PanCK (green), D) The generative version of (C).