Table of Contents
Fetching ...

How Effective Can Dropout Be in Multiple Instance Learning ?

Wenhui Zhu, Peijie Qiu, Xiwen Chen, Zhangsihao Yang, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

TL;DR

This work investigates dropout in multiple instance learning (MIL), with a focus on histological whole-slide image (WSI) classification where training is typically two-stage and feature embeddings are noisy. It reveals that dropping the top-k most important instances (top-k DropInstance) reduces gradient direction error and promotes flatter, more generalizable minima, and this insight motivates MIL-Dropout, a MIL-specific dropout method. MIL-Dropout uses a non-parametric averaging-based attention to rank instance importance and a query-based mechanism to drop the top-k and their similar instances, with normalization to stabilize training. Empirical results on five MIL benchmarks and two WSI datasets show consistent, substantial gains across diverse MIL aggregators at negligible computational cost, complemented by ablations on hyperparameters and lesion localization analyses. The findings offer both theoretical and practical contributions to regularizing MIL in challenging, weakly supervised settings like digital pathology.

Abstract

Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification. Due to the gigapixel resolution of WSI, applications of MIL in WSI typically necessitate a two-stage training scheme: first, extract features from the pre-trained backbone and then perform MIL aggregation. However, it is well-known that this suboptimal training scheme suffers from "noisy" feature embeddings from the backbone and inherent weak supervision, hindering MIL from learning rich and generalizable features. However, the most commonly used technique (i.e., dropout) for mitigating this issue has yet to be explored in MIL. In this paper, we empirically explore how effective the dropout can be in MIL. Interestingly, we observe that dropping the top-k most important instances within a bag leads to better performance and generalization even under noise attack. Based on this key observation, we propose a novel MIL-specific dropout method, termed MIL-Dropout, which systematically determines which instances to drop. Experiments on five MIL benchmark datasets and two WSI datasets demonstrate that MIL-Dropout boosts the performance of current MIL methods with a negligible computational cost. The code is available at https://github.com/ChongQingNoSubway/MILDropout.

How Effective Can Dropout Be in Multiple Instance Learning ?

TL;DR

This work investigates dropout in multiple instance learning (MIL), with a focus on histological whole-slide image (WSI) classification where training is typically two-stage and feature embeddings are noisy. It reveals that dropping the top-k most important instances (top-k DropInstance) reduces gradient direction error and promotes flatter, more generalizable minima, and this insight motivates MIL-Dropout, a MIL-specific dropout method. MIL-Dropout uses a non-parametric averaging-based attention to rank instance importance and a query-based mechanism to drop the top-k and their similar instances, with normalization to stabilize training. Empirical results on five MIL benchmarks and two WSI datasets show consistent, substantial gains across diverse MIL aggregators at negligible computational cost, complemented by ablations on hyperparameters and lesion localization analyses. The findings offer both theoretical and practical contributions to regularizing MIL in challenging, weakly supervised settings like digital pathology.

Abstract

Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification. Due to the gigapixel resolution of WSI, applications of MIL in WSI typically necessitate a two-stage training scheme: first, extract features from the pre-trained backbone and then perform MIL aggregation. However, it is well-known that this suboptimal training scheme suffers from "noisy" feature embeddings from the backbone and inherent weak supervision, hindering MIL from learning rich and generalizable features. However, the most commonly used technique (i.e., dropout) for mitigating this issue has yet to be explored in MIL. In this paper, we empirically explore how effective the dropout can be in MIL. Interestingly, we observe that dropping the top-k most important instances within a bag leads to better performance and generalization even under noise attack. Based on this key observation, we propose a novel MIL-specific dropout method, termed MIL-Dropout, which systematically determines which instances to drop. Experiments on five MIL benchmark datasets and two WSI datasets demonstrate that MIL-Dropout boosts the performance of current MIL methods with a negligible computational cost. The code is available at https://github.com/ChongQingNoSubway/MILDropout.

Paper Structure

This paper contains 44 sections, 16 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: An illustrative example comparing the convergence trajectories of the baseline ABMIL without dropout (Left) and ABMIL with the proposed MIL-Dropout (Right). ABMIL without dropout is likely to follow an incorrect gradient direction initially and eventually converge to a sharp minimum. In contrast, ABMIL with the proposed MIL-Dropout typically achieves a lower gradient direction error and reaches a flatter minimum with a better generalization.
  • Figure 2: An conceptual illustration for a flat and sharp minimum in 1D curvature (Left) and 2D landscape (Right) of the loss function $\mathcal{L}_\theta$.
  • Figure 3: The landscape of the loss function $\mathcal{L}_{\theta}$ for two different dropout strategies (a) DropNeuron and (b) DropInstance as well as (c) the performance of MIL models against different noise attacks. We mark the Euclidean ball $\mathcal{B}_2(\epsilon, \theta^*)$ around the optimal parameter $\theta^*$ (see Eq. \ref{['eq:flat1']}) in subpanel figure (a) and (b) with a red circle. We observe that the landscape of $\mathcal{L}_\theta$ in the DropInstance scenario leads to flatter minima compared to DropNeuron, which also results in a better performance in AUC.
  • Figure 4: The comparison of change of GDE (Left) over the first 10,000 iterations as well as performance and loss (line plot) and AUC (bar plot) when using different instance dropout strategies (Right), where the area under GDE is the area enclosed by GDE and the x-axis. Dropping the top-k instances shows the smallest GDE, training loss , and highest AUC among all four strategies.
  • Figure 5: Ablation studies on the number of top-k instances $K$ (a) and similarity instance $S$ (b) using CAMELYON16 and TCGA-NSCLC datasets. (c) Attention map from ABMIL without and with MIL-Dropout, with tumor regions outlined in red. Brighter cyan in columns two and three indicates higher tumor probability (higher attention score) for corresponding locations.
  • ...and 3 more figures