M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

Junyu Li; Ye Zhang; Wen Shu; Xiaobing Feng; Yingchun Wang; Pengju Yan; Xiaolin Li; Chulin Sha; Min He

M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

Junyu Li, Ye Zhang, Wen Shu, Xiaobing Feng, Yingchun Wang, Pengju Yan, Xiaolin Li, Chulin Sha, Min He

TL;DR

MIL for WSIs is often limited to single-task predictions, failing to exploit inter-task relationships among genetic mutations. M4 extends the Multi-gate Mixture-of-Experts framework to the MIL setting by introducing a multi-proxy MIL expert network and per-task gates, enabling joint prediction of multiple mutations from WSIs. The approach yields improved average AUC across five TCGA datasets and provides heatmap-based interpretability showing tumor-focused attention, particularly for rare mutations. This work advances multi-task WSI analysis and supports scalable, precision-oncology applications by efficiently modeling inter-task correlations from histopathology imagery.

Abstract

Multiple instance learning (MIL) has been successfully applied for whole slide images (WSIs) analysis in computational pathology, enabling a wide range of prediction tasks from tumor subtyping to inferring genetic mutations and multi-omics biomarkers. However, existing MIL methods predominantly focus on single-task learning, resulting in not only overall low efficiency but also the overlook of inter-task relatedness. To address these issues, we proposed an adapted architecture of Multi-gate Mixture-of-experts with Multi-proxy for Multiple instance learning (M4), and applied this framework for simultaneous prediction of multiple genetic mutations from WSIs. The proposed M4 model has two main innovations: (1) utilizing a mixture of experts with multiple gating strategies for multi-genetic mutation prediction on a single pathological slide; (2) constructing multi-proxy expert network and gate network for comprehensive and effective modeling of pathological image information. Our model achieved significant improvements across five tested TCGA datasets in comparison to current state-of-the-art single-task methods. The code is available at:https://github.com/Bigyehahaha/M4.

M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

TL;DR

Abstract

Paper Structure (21 sections, 15 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 15 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Related work
Multiple Instance Learning in WSI Analysis
Multi-task Learning
Methods
Revisit MIL and MTL
Attention-Based Multiple Instance Learning
Multi-gate Mixture of Experts Architecture
Self-supervised Pre-trained Feature Extraction Models
M4 Model Framework
Implementation of Experts
Implementation of The Gates
Implementation of Task Specific Towers
Experiments
Dataset Description and Evaluation Details
...and 6 more sections

Figures (5)

Figure 1: Illustration of multi-task prediction of genetic mutations. Given a WSI, the integrating MMoE and MIL method can be used to simultaneously predict multiple genetic mutations.
Figure 2: (a)Overview of our proposed M4 architecture. A set of patches are cropped from the tissue regions of a WSI and input to a pre-trained patch-level feature extractor to obtain feature representations of patches. Then all the patches features input to experts and gates layers, and aggregate multi experts information for each task through a multi-proxy MMoE network. Finally, aggregated WSI-level features will be sent to the corresponding tasks' towers for downstream classification. (b)Multi-proxy expert network MP-AMIL. (c)Multi-proxy gate network MP-Gate.
Figure 3: Radar charts of AUC performance of various models on five TCGA datasets. The red line represents the prediction results of the proposed model M4. The values in the legend represent the area enclosed by the radar charts.
Figure 4: Box plots highlighting the AUC performance of our proposed M4 model under different numbers of experts and different numbers of tasks.
Figure 5: Heatmap Visualization for M4. (a) Original slide. In original slide, the areas with purple are more likely to be the tumor regions. In the other column. (b) Heatmaps of the slide by single-task AMIL attention scores and by M4 model, respectively. Warm colors indicate higher probabilities to be the region of interest for the corresponding locations. (c) Heatmap of different experts by M4 model.

M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

TL;DR

Abstract

M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (5)