MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification
Jianwei Zhao, Xin Li, Fan Yang, Qiang Zhai, Ao Luo, Yang Zhao, Hong Cheng, Huazhu Fu
TL;DR
MExD tackles the core challenges of whole-slide image classification—gigapixel-scale images, pervasive non-informative regions, and data imbalance—by integrating a Dynamic Mixture-of-Experts (Dyn-MoE) aggregator with a diffusion-based classifier (Diff-C). The Dyn-MoE selectively routes informative patches and produces a prior informed by expert insights, while Diff-C denoises a 1-D representation of the slide label conditioned on these priors, enabling direct generation of slide-level predictions. This generative approach, presented as the first diffusion-based WSI classification method, achieves state-of-the-art results on Camelyon16, TCGA-NSCLC, and BRACS and provides robust uncertainty estimation via PAvPU. The work demonstrates a promising paradigm shift from discriminative MIL toward generative WSI analysis with potential broad impact in medical imaging and beyond.
Abstract
Whole Slide Image (WSI) classification poses unique challenges due to the vast image size and numerous non-informative regions, which introduce noise and cause data imbalance during feature aggregation. To address these issues, we propose MExD, an Expert-Infused Diffusion Model that combines the strengths of a Mixture-of-Experts (MoE) mechanism with a diffusion model for enhanced classification. MExD balances patch feature distribution through a novel MoE-based aggregator that selectively emphasizes relevant information, effectively filtering noise, addressing data imbalance, and extracting essential features. These features are then integrated via a diffusion-based generative process to directly yield the class distribution for the WSI. Moving beyond conventional discriminative approaches, MExD represents the first generative strategy in WSI classification, capturing fine-grained details for robust and precise results. Our MExD is validated on three widely-used benchmarks-Camelyon16, TCGA-NSCLC, and BRACS consistently achieving state-of-the-art performance in both binary and multi-class tasks.
