Table of Contents
Fetching ...

Evaluation Metric for Quality Control and Generative Models in Histopathology Images

Pranav Jeevan, Neeraj Nixon, Abhijeet Patil, Amit Sethi

TL;DR

This work tackles the challenge of evaluating generative models for histopathology images under data scarcity, where traditional metrics like FID struggle. It introduces ResNet-L2 (RL2), a metric that trains a normalizing flow on ResNet-18 features extracted from real images to map them into a Gaussian latent space, then computes the RMSE between the latent means of real and generated images: $\text{RL2} = ||z_r - z_g||_2$. The method emphasizes domain-specific feature representation and computational efficiency, achieving monotonic responses to blur, noise, and diffusion—while requiring far fewer samples than FID-based approaches. Practically, RL2 enables reliable quality assessment and patch filtering in whole-slide image pipelines, particularly beneficial when data is limited and rapid evaluation is needed.

Abstract

Our study introduces ResNet-L2 (RL2), a novel metric for evaluating generative models and image quality in histopathology, addressing limitations of traditional metrics, such as Frechet inception distance (FID), when the data is scarce. RL2 leverages ResNet features with a normalizing flow to calculate RMSE distance in the latent space, providing reliable assessments across diverse histopathology datasets. We evaluated the performance of RL2 on degradation types, such as blur, Gaussian noise, salt-and-pepper noise, and rectangular patches, as well as diffusion processes. RL2's monotonic response to increasing degradation makes it well-suited for models that assess image quality, proving a valuable advancement for evaluating image generation techniques in histopathology. It can also be used to discard low-quality patches while sampling from a whole slide image. It is also significantly lighter and faster compared to traditional metrics and requires fewer images to give stable metric value.

Evaluation Metric for Quality Control and Generative Models in Histopathology Images

TL;DR

This work tackles the challenge of evaluating generative models for histopathology images under data scarcity, where traditional metrics like FID struggle. It introduces ResNet-L2 (RL2), a metric that trains a normalizing flow on ResNet-18 features extracted from real images to map them into a Gaussian latent space, then computes the RMSE between the latent means of real and generated images: . The method emphasizes domain-specific feature representation and computational efficiency, achieving monotonic responses to blur, noise, and diffusion—while requiring far fewer samples than FID-based approaches. Practically, RL2 enables reliable quality assessment and patch filtering in whole-slide image pipelines, particularly beneficial when data is limited and rapid evaluation is needed.

Abstract

Our study introduces ResNet-L2 (RL2), a novel metric for evaluating generative models and image quality in histopathology, addressing limitations of traditional metrics, such as Frechet inception distance (FID), when the data is scarce. RL2 leverages ResNet features with a normalizing flow to calculate RMSE distance in the latent space, providing reliable assessments across diverse histopathology datasets. We evaluated the performance of RL2 on degradation types, such as blur, Gaussian noise, salt-and-pepper noise, and rectangular patches, as well as diffusion processes. RL2's monotonic response to increasing degradation makes it well-suited for models that assess image quality, proving a valuable advancement for evaluating image generation techniques in histopathology. It can also be used to discard low-quality patches while sampling from a whole slide image. It is also significantly lighter and faster compared to traditional metrics and requires fewer images to give stable metric value.

Paper Structure

This paper contains 6 sections, 4 equations, 5 figures.

Figures (5)

  • Figure 1: The process of computing RL2. In the training phase, the ResNet-normalizing flow network is trained on the given real (high-quality) images. In the evaluation phase, real and generated (or evaluation) images are passed through the network, and the L2 distance between mean of latent vectors of real and generated (or evaluation) images is used as the final metric.
  • Figure 2: The behaviour of RL2 shows that it is monotonic with increasing levels of blur caused by variation of z-levels in FocusPath dataset. The top row shows the RL2 value and the bottom row shows the z-values.
  • Figure 3: The behaviour of RL2 shows that it is monotonic with increasing levels of salt and pepper noise in histopathology images from $0$ z-level FocusPath dataset.. The top row shows the RL2 value and the bottom row shows the noise values.
  • Figure 4: The behaviour of RL2 shows that it is monotonic with increasing levels of rectangular patch noise in histopathology images from $0$ z-level FocusPath dataset.. The top row shows the RL2 value and the bottom row shows the noise values.
  • Figure 5: The behaviour of RL2 shows that it is monotonic with increasing levels of salt and pepper noise in histopathology images from $0$ z-level FocusPath dataset.