Table of Contents
Fetching ...

MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification

Jianwei Zhao, Xin Li, Fan Yang, Qiang Zhai, Ao Luo, Yang Zhao, Hong Cheng, Huazhu Fu

TL;DR

MExD tackles the core challenges of whole-slide image classification—gigapixel-scale images, pervasive non-informative regions, and data imbalance—by integrating a Dynamic Mixture-of-Experts (Dyn-MoE) aggregator with a diffusion-based classifier (Diff-C). The Dyn-MoE selectively routes informative patches and produces a prior informed by expert insights, while Diff-C denoises a 1-D representation of the slide label conditioned on these priors, enabling direct generation of slide-level predictions. This generative approach, presented as the first diffusion-based WSI classification method, achieves state-of-the-art results on Camelyon16, TCGA-NSCLC, and BRACS and provides robust uncertainty estimation via PAvPU. The work demonstrates a promising paradigm shift from discriminative MIL toward generative WSI analysis with potential broad impact in medical imaging and beyond.

Abstract

Whole Slide Image (WSI) classification poses unique challenges due to the vast image size and numerous non-informative regions, which introduce noise and cause data imbalance during feature aggregation. To address these issues, we propose MExD, an Expert-Infused Diffusion Model that combines the strengths of a Mixture-of-Experts (MoE) mechanism with a diffusion model for enhanced classification. MExD balances patch feature distribution through a novel MoE-based aggregator that selectively emphasizes relevant information, effectively filtering noise, addressing data imbalance, and extracting essential features. These features are then integrated via a diffusion-based generative process to directly yield the class distribution for the WSI. Moving beyond conventional discriminative approaches, MExD represents the first generative strategy in WSI classification, capturing fine-grained details for robust and precise results. Our MExD is validated on three widely-used benchmarks-Camelyon16, TCGA-NSCLC, and BRACS consistently achieving state-of-the-art performance in both binary and multi-class tasks.

MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification

TL;DR

MExD tackles the core challenges of whole-slide image classification—gigapixel-scale images, pervasive non-informative regions, and data imbalance—by integrating a Dynamic Mixture-of-Experts (Dyn-MoE) aggregator with a diffusion-based classifier (Diff-C). The Dyn-MoE selectively routes informative patches and produces a prior informed by expert insights, while Diff-C denoises a 1-D representation of the slide label conditioned on these priors, enabling direct generation of slide-level predictions. This generative approach, presented as the first diffusion-based WSI classification method, achieves state-of-the-art results on Camelyon16, TCGA-NSCLC, and BRACS and provides robust uncertainty estimation via PAvPU. The work demonstrates a promising paradigm shift from discriminative MIL toward generative WSI analysis with potential broad impact in medical imaging and beyond.

Abstract

Whole Slide Image (WSI) classification poses unique challenges due to the vast image size and numerous non-informative regions, which introduce noise and cause data imbalance during feature aggregation. To address these issues, we propose MExD, an Expert-Infused Diffusion Model that combines the strengths of a Mixture-of-Experts (MoE) mechanism with a diffusion model for enhanced classification. MExD balances patch feature distribution through a novel MoE-based aggregator that selectively emphasizes relevant information, effectively filtering noise, addressing data imbalance, and extracting essential features. These features are then integrated via a diffusion-based generative process to directly yield the class distribution for the WSI. Moving beyond conventional discriminative approaches, MExD represents the first generative strategy in WSI classification, capturing fine-grained details for robust and precise results. Our MExD is validated on three widely-used benchmarks-Camelyon16, TCGA-NSCLC, and BRACS consistently achieving state-of-the-art performance in both binary and multi-class tasks.

Paper Structure

This paper contains 10 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Idea Illustration. (a) Unlike the conventional MIL-based discriminative approaches for WSI processing, (b) we employ the Dyn-MoE to extract/mine effective conditional information from WSI, employing a generative framework to infer classification results.
  • Figure 2: Overview of MExD. Our framework leverages a generative classification diffusion model for enhanced reliability and generalizability, where $\boldsymbol{g}_{\alpha}$ and $\boldsymbol{\rho}_{\theta}$ encode expert insights and prior knowledge, conditioning Diff-C (see Sec. \ref{['sec:mexd']} for details).
  • Figure 3: Details of Dyn-MoE Aggregator. Our Dyn-MoE includes K positive experts and 1 negative expert, producing expert insights set ($\boldsymbol{g}_{\alpha}$) and prior prediction ($\boldsymbol{\rho}_{\theta}$) to work alongside Diff-C. 'Avg.' and 'Ada.' indicate mean-pooling and Adapter. Zoom in for details.
  • Figure 4: Distribution Visualization of patch-wise router scores for positive instances, where each score corresponds to a selected patch. In column 2, expert 1 effectively identifies positive patches, then refines and retains the most representative ones (column 3) through score-based selection. Concurrently, expert 0 focuses on refining negative instances. The green edges indicate cancerous region.
  • Figure 5: Normality Assumption Assessment: Q-Q plots showing the probability difference between the top two predicted classes within a bag, with two bags selected from each benchmark for visualization.