Table of Contents
Fetching ...

Can virtual staining for high-throughput screening generalize?

Samuel Tonks, Cuong Nguyen, Steve Hood, Ryan Musso, Ceridwen Hopely, Steve Titus, Minh Doan, Iain Styles, Alexander Krull

TL;DR

It is indicated that training virtual nuclei and cytoplasm models on non-toxic condition samples not only generalizes to toxic condition samples but leads to improved performance across all evaluation levels compared to training on toxic condition samples.

Abstract

The large volume and variety of imaging data from high-throughput screening (HTS) in the pharmaceutical industry present an excellent resource for training virtual staining models. However, the potential of models trained under one set of experimental conditions to generalize to other conditions remains underexplored. This study systematically investigates whether data from three cell types (lung, ovarian, and breast) and two phenotypes (toxic and non-toxic conditions) commonly found in HTS can effectively train virtual staining models to generalize across three typical HTS distribution shifts: unseen phenotypes, unseen cell types, and the combination of both. Utilizing a dataset of 772,416 paired bright-field, cytoplasm, nuclei, and DNA-damage stain images, we evaluate the generalization capabilities of models across pixel-based, instance-wise, and biological-feature-based levels. Our findings indicate that training virtual nuclei and cytoplasm models on non-toxic condition samples not only generalizes to toxic condition samples but leads to improved performance across all evaluation levels compared to training on toxic condition samples. Generalization to unseen cell types shows variability depending on the cell type; models trained on ovarian or lung cell samples often perform well under other conditions, while those trained on breast cell samples consistently show poor generalization. Generalization to unseen cell types and phenotypes shows good generalization across all levels of evaluation compared to addressing unseen cell types alone. This study represents the first large-scale, data-centric analysis of the generalization capability of virtual staining models trained on diverse HTS datasets, providing valuable strategies for experimental training data generation.

Can virtual staining for high-throughput screening generalize?

TL;DR

It is indicated that training virtual nuclei and cytoplasm models on non-toxic condition samples not only generalizes to toxic condition samples but leads to improved performance across all evaluation levels compared to training on toxic condition samples.

Abstract

The large volume and variety of imaging data from high-throughput screening (HTS) in the pharmaceutical industry present an excellent resource for training virtual staining models. However, the potential of models trained under one set of experimental conditions to generalize to other conditions remains underexplored. This study systematically investigates whether data from three cell types (lung, ovarian, and breast) and two phenotypes (toxic and non-toxic conditions) commonly found in HTS can effectively train virtual staining models to generalize across three typical HTS distribution shifts: unseen phenotypes, unseen cell types, and the combination of both. Utilizing a dataset of 772,416 paired bright-field, cytoplasm, nuclei, and DNA-damage stain images, we evaluate the generalization capabilities of models across pixel-based, instance-wise, and biological-feature-based levels. Our findings indicate that training virtual nuclei and cytoplasm models on non-toxic condition samples not only generalizes to toxic condition samples but leads to improved performance across all evaluation levels compared to training on toxic condition samples. Generalization to unseen cell types shows variability depending on the cell type; models trained on ovarian or lung cell samples often perform well under other conditions, while those trained on breast cell samples consistently show poor generalization. Generalization to unseen cell types and phenotypes shows good generalization across all levels of evaluation compared to addressing unseen cell types alone. This study represents the first large-scale, data-centric analysis of the generalization capability of virtual staining models trained on diverse HTS datasets, providing valuable strategies for experimental training data generation.
Paper Structure (7 sections, 6 figures)

This paper contains 7 sections, 6 figures.

Figures (6)

  • Figure 1: GSK HTS dataset comprised of three different cell types; ovarian, breast, and lung and two phenotypes; non-toxic and toxic. Each 2x4 is comprised of 4 randomly selected bright-field and fluorescence stain image pairs (shown as a composite image) for each cell type and phenotype. The composite image shows the nuclei stain (DAPI) in red, the cytoplasm stain (FITC) in cyan, and the DNA-damage stain (Cy5) in yellow. Within the dataset, we observe variability within the toxic and non-toxic samples of each cell type as well as anatomical differences between the different cell types. We explore the generalization performance across all three virtual staining tasks for three common HTS data distribution shifts; generalizing to new phenotypes, generalizing to new cell types, and both combined.
  • Figure 2: Generalization performance of virtual staining models to an unseen phenotype across three levels of evaluation. Each chart represents the results for that metric, within the chart all virtual stain channels are shown separately and grouped by cell type. Each bar shows the average difference between the virtual stain models trained on non-toxic and the baseline virtual stain models trained on toxic samples. For all three cell types and virtual stain tasks, the PSNR, Jaccard Index, and F1 Score results reveal improved performance from training on non-toxic samples compared to training on toxic samples. Consistently across all metrics, training on ovarian non-toxic leads to improved performance when generalizing to images of ovarian toxic cells.
  • Figure 3: Qualitative results for the task of generalizing to an unseen phenotype; ovarian toxic from ovarian non-toxic. Randomly selected bright-field and paired fluorescence for nuclei, cytoplasm and DNA-damage stains alongside the virtual staining predictions from each of the virtual stain models trained on ovarian toxic samples and the virtual stain models trained on ovarian non-toxic samples. We observe the general shape of nuclei and cells are reproduced well relative to the baseline and fluorescence stain. The DNA-damage spots are considerably different from the fluorescence stain for both the baseline and model trained on non-toxic samples. Examples for all three virtual stain tasks are shown by yellow arrows.
  • Figure 4: N-MAE values of the virtual stain models trained on ovarian non-toxic and ovarian toxic samples for the 20 CellProfiler features identified on the fluorescence stain. The green line shows the average N-MAE for the virtual stain models trained on ovarian non-toxic samples and the yellow line shows the average N-MAE for the virtual stain models trained on ovarian toxic samples. Across a diverse set of features, training on ovarian non-toxic leads to a biological feature representation that more closely aligns with that found in the fluorescence stains compared to training on ovarian toxic.
  • Figure 5: Qualitative results for the task of generalizing to unseen cell types. Randomly selected bright-field and fluorescence for nuclei, cytoplasm and DNA-Damage stains with the virtual stain from the baseline model and the virtual stain models that generalize well and generalize poorly. Both the virtual nuclei and virtual cytoplasm trained on images of ovarian non-toxic cells can reproduce the general shape of the lung cells and in some cases the intensity profile well relative to the baseline model trained on images of lung cells. Meanwhile, the models trained on images of breast cells show a considerable number of nuclei and cytoplasm missing as well as incorrect morphology. For the virtual DNA-damage although the model trained on lung does well relative to the model trained on breast there are still very clear differences in intensity and DNA-damage spot locations between all three virtual stain predictions and the fluorescence stain.
  • ...and 1 more figures