Evaluation Metric for Quality Control and Generative Models in Histopathology Images
Pranav Jeevan, Neeraj Nixon, Abhijeet Patil, Amit Sethi
TL;DR
This work tackles the challenge of evaluating generative models for histopathology images under data scarcity, where traditional metrics like FID struggle. It introduces ResNet-L2 (RL2), a metric that trains a normalizing flow on ResNet-18 features extracted from real images to map them into a Gaussian latent space, then computes the RMSE between the latent means of real and generated images: $\text{RL2} = ||z_r - z_g||_2$. The method emphasizes domain-specific feature representation and computational efficiency, achieving monotonic responses to blur, noise, and diffusion—while requiring far fewer samples than FID-based approaches. Practically, RL2 enables reliable quality assessment and patch filtering in whole-slide image pipelines, particularly beneficial when data is limited and rapid evaluation is needed.
Abstract
Our study introduces ResNet-L2 (RL2), a novel metric for evaluating generative models and image quality in histopathology, addressing limitations of traditional metrics, such as Frechet inception distance (FID), when the data is scarce. RL2 leverages ResNet features with a normalizing flow to calculate RMSE distance in the latent space, providing reliable assessments across diverse histopathology datasets. We evaluated the performance of RL2 on degradation types, such as blur, Gaussian noise, salt-and-pepper noise, and rectangular patches, as well as diffusion processes. RL2's monotonic response to increasing degradation makes it well-suited for models that assess image quality, proving a valuable advancement for evaluating image generation techniques in histopathology. It can also be used to discard low-quality patches while sampling from a whole slide image. It is also significantly lighter and faster compared to traditional metrics and requires fewer images to give stable metric value.
