Clinically Relevant Latent Space Embedding of Cancer Histopathology Slides through Variational Autoencoder Based Image Compression

Mohammad Sadegh Nasr; Amir Hajighasemi; Paul Koomey; Parisa Boodaghi Malidarreh; Michael Robben; Jillur Rahman Saurav; Helen H. Shang; Manfred Huber; Jacob M. Luber

Clinically Relevant Latent Space Embedding of Cancer Histopathology Slides through Variational Autoencoder Based Image Compression

Mohammad Sadegh Nasr, Amir Hajighasemi, Paul Koomey, Parisa Boodaghi Malidarreh, Michael Robben, Jillur Rahman Saurav, Helen H. Shang, Manfred Huber, Jacob M. Luber

Abstract

In this paper, we introduce a Variational Autoencoder (VAE) based training approach that can compress and decompress cancer pathology slides at a compression ratio of 1:512, which is better than the previously reported state of the art (SOTA) in the literature, while still maintaining accuracy in clinical validation tasks. The compression approach was tested on more common computer vision datasets such as CIFAR10, and we explore which image characteristics enable this compression ratio on cancer imaging data but not generic images. We generate and visualize embeddings from the compressed latent space and demonstrate how they are useful for clinical interpretation of data, and how in the future such latent embeddings can be used to accelerate search of clinical imaging data.

Clinically Relevant Latent Space Embedding of Cancer Histopathology Slides through Variational Autoencoder Based Image Compression

Abstract

Paper Structure (15 sections, 1 equation, 4 figures)

This paper contains 15 sections, 1 equation, 4 figures.

Introduction
Methods
Dataset
Latent Variables and VAE
Training and Inference Pipelines
Dimension Reduction and UMAP
Settings and Experiments
Training Settings
Compression Experiments
Validation Experiments
UMAP Experiments
Results and Conclusion
Data and Code Availability
Compliance with ethical standards
Acknowledgments

Figures (4)

Figure 1: (a) Overview of the VAE training pipeline. (b) Overview of the pipeline at inference. For generating UMAP plots, a similar patch sampling as training is used.
Figure 2: (a) Example of how normalization affects the performance of our pipeline. Both models are trained using the exact same hyper-parameters (latent_dim = 64). (b) The effect of batch size and latent dimension of validation loss. For better visualization, early stopping is not used for these experiments.
Figure 3: Effect of dataset entropy and color content on peroformance. All hyper-parameters are the same for all 5 models.
Figure 4: (a) The reconstruction results for breast cancer tissues at 5 different compression ratios. (b) UMAP plot generated on 4 different tissue types with a compression ratio of 1:64.

Clinically Relevant Latent Space Embedding of Cancer Histopathology Slides through Variational Autoencoder Based Image Compression

Abstract

Clinically Relevant Latent Space Embedding of Cancer Histopathology Slides through Variational Autoencoder Based Image Compression

Authors

Abstract

Table of Contents

Figures (4)