Table of Contents
Fetching ...

MicroSSIM: Improved Structural Similarity for Comparing Microscopy Data

Ashesh Ashesh, Joran Deschamps, Florian Jug

TL;DR

It is shown that SSIM components behave unexpectedly when the prediction generated from low-SNR input is compared with the corresponding high-SNR data, and the phenomenon of saturation is introduced, where SSIM components become less sensitive to (dis)similarity between the images.

Abstract

Microscopy is routinely used to image biological structures of interest. Due to imaging constraints, acquired images, also called as micrographs, are typically low-SNR and contain noise. Over the last few years, regression-based tasks like unsupervised denoising and splitting have found utility in working with such noisy micrographs. For evaluation, Structural Similarity (SSIM) is one of the most popular measures used in the field. For such tasks, the best evaluation would be when both low-SNR noisy images and corresponding high-SNR clean images are obtained directly from a microscope. However, due to the following three peculiar properties of the microscopy data, we observe that SSIM is not well suited to this data regime: (a) high-SNR micrographs have higher intensity pixels as compared to low-SNR micrographs, (b) high-SNR micrographs have higher intensity pixels than found in natural images, images for which SSIM was developed, and (c) a digitally configurable offset is added by the detector present inside the microscope which affects the SSIM value. We show that SSIM components behave unexpectedly when the prediction generated from low-SNR input is compared with the corresponding high-SNR data. We explain this by introducing the phenomenon of saturation, where SSIM components become less sensitive to (dis)similarity between the images. We propose an intuitive way to quantify this, which explains the observed SSIM behavior. We introduce MicroSSIM, a variant of SSIM, which overcomes the above-discussed issues. We justify the soundness and utility of MicroSSIM using theoretical and empirical arguments and show the utility of MicroSSIM on two tasks: unsupervised denoising and joint image splitting with unsupervised denoising. Since our formulation can be applied to a broad family of SSIM-based measures, we also introduce MicroMS3IM, a microscopy-specific variation of MS-SSIM.

MicroSSIM: Improved Structural Similarity for Comparing Microscopy Data

TL;DR

It is shown that SSIM components behave unexpectedly when the prediction generated from low-SNR input is compared with the corresponding high-SNR data, and the phenomenon of saturation is introduced, where SSIM components become less sensitive to (dis)similarity between the images.

Abstract

Microscopy is routinely used to image biological structures of interest. Due to imaging constraints, acquired images, also called as micrographs, are typically low-SNR and contain noise. Over the last few years, regression-based tasks like unsupervised denoising and splitting have found utility in working with such noisy micrographs. For evaluation, Structural Similarity (SSIM) is one of the most popular measures used in the field. For such tasks, the best evaluation would be when both low-SNR noisy images and corresponding high-SNR clean images are obtained directly from a microscope. However, due to the following three peculiar properties of the microscopy data, we observe that SSIM is not well suited to this data regime: (a) high-SNR micrographs have higher intensity pixels as compared to low-SNR micrographs, (b) high-SNR micrographs have higher intensity pixels than found in natural images, images for which SSIM was developed, and (c) a digitally configurable offset is added by the detector present inside the microscope which affects the SSIM value. We show that SSIM components behave unexpectedly when the prediction generated from low-SNR input is compared with the corresponding high-SNR data. We explain this by introducing the phenomenon of saturation, where SSIM components become less sensitive to (dis)similarity between the images. We propose an intuitive way to quantify this, which explains the observed SSIM behavior. We introduce MicroSSIM, a variant of SSIM, which overcomes the above-discussed issues. We justify the soundness and utility of MicroSSIM using theoretical and empirical arguments and show the utility of MicroSSIM on two tasks: unsupervised denoising and joint image splitting with unsupervised denoising. Since our formulation can be applied to a broad family of SSIM-based measures, we also introduce MicroMS3IM, a microscopy-specific variation of MS-SSIM.
Paper Structure (21 sections, 15 equations, 23 figures, 3 tables)

This paper contains 21 sections, 15 equations, 23 figures, 3 tables.

Figures (23)

  • Figure 1: Failure mode of SSIM on Microscopy Data.(A) A noisy microscopy image, i.e. a micrograph, its denoised version predicted using N2V, and the corresponding High-SNR (noise free) ground truth is shown. There is a problem in the evaluation of denoising quality, which is that the pixel intensity distribution of the ground truth and the prediction (as shown in respective insets) are very different. This is specifically true for the foreground content which comprises brighter pixels. So applying SSIM directly on it will not give a sensible value. We solve this issue with $\mathbb{M}\text{icroSSIM}$. (B) We show one example to demonstrate an apparent counter-intuitive behavior of SSIM. The SSIM between a natural image (taken from Imagenet imagenet) and a pure noisy image drawn from the uniform distribution is much lower than the SSIM between a micrograph and a noisy image with identical distribution as before. The expectation naturally is to have $\text{SSIM}\approx 0$ in both cases. We solve this issue with $\mathbb{M}\text{icroSSIM}$ and appropriate data pre-processing and show it in bottom right plot where 30 random microscopy images are used.
  • Figure 2: Comparison with baselines: In this figure, different SSIM variants are applied between GT and prediction by N2V and we show the three SSIM components namely Luminance, Contrast, and Structure. Noisy input, prediction, and noise-free ground-truth comprise the first row with histograms in the inset showing the distribution of pixel intensities. We show three examples from Actin and Mitocondria channels of Hagen et al. dataset. We compare SSIM, vanilla SSIM on unnormalized images (row 2), SSIM Normalized, SSIM on mean-std based normalized images (row 3), CARE-SSIM, SSIM with scaling and normalization as proposed in CARE Weigert2018-pi (row 4) and MicroSSIM (row 5). Firstly, the structure component of SSIM is saturated even though the prediction is visibly different from GT. Secondly, SSIM-Normalized and CARE-SSIM are highly sensitive to background regions. This does not align with our expectation that the metric should show poor performance if the foreground content is dissimilar and should be less sensitive to background regions. MicroSSIM, as can be seen below, does not seem to suffer from either of the two issues.
  • Figure 3: Inspecting role of pre-processing on saturation factor $\Delta$ for Mitochondria denoising task: The top and bottom panels investigate the role of background and downscaling respectively. All baselines used here do not use $\alpha$. Only $\mathbb{M}\text{icroSSIM}$ uses it. Lower $\Delta$ values indicate higher sensitivity of SSIM towards the (dis)similarity between GT and prediction. So, lower values are preferable. In the second row of both panels, we compare GT with a purely noisy image drawn from a uniform distribution. In this case, lower SSIM values are better. (Top)With-Bkg baseline does not remove the background in the pre-processing step, and Inst-bkg-rm baseline first estimates the background from the given GT and prediction image separately and then removes it from the two images. Note that in $\mathbb{M}\text{icroSSIM}$, background estimation does not happen separately for a single image, but happens once at the dataset level. (Bottom)Downscale and No-Downscale baselines , similar to $\mathbb{M}\text{icroSSIM}$, removes the background in the pre-processing step. Downscale also divides the GT and prediction with the maximum pixel value present in the GT but No-Downscale does not.
  • Figure 4: Background removal ablation: We show SSIM components and estimated $\alpha$ with background removal disabled (row 3), enabled (row 4), and mean removal (row 2). We use two random full frames from Actin and Mito sub-datasets. Without background removal, $\alpha$ gets underestimated. With mean removal, $\alpha$ gets overestimated.
  • Figure 5: (Left: Role of offset in luminance) In this ablation, after doing our proposed normalization of GT and prediction, we add an offset to both GT and prediction and plot the mean SSIM and mean of its components. As can be inferred from the two plots made using two random full frames from Actin (row 1) and Mito (row 2) subdatasets, offset indeed influences the luminance, and therefore, the M-SSIM value. This means that the detector offset set by the microscopist will play a role in SSIM computation. Ideally, only denoising model's performance should play a role. We ensure this by removing the background in our normalization. (Right:Uniqueness of $\alpha$) Here, we took 30 random full frames from Actin and Mito subdataset. Using the pre-trained N2V model, we obtained the denoised predictions and used MicroSSIM to evaluate the prediction. Instead of estimating a single value of $\alpha$ for all images, we manually tried different $\alpha$ values and computed SSIM using it. One can see that for every curve, a unique $\alpha$ exists which gives the maximal SSIM.
  • ...and 18 more figures