Table of Contents
Fetching ...

PathBench-MIL: A Comprehensive AutoML and Benchmarking Framework for Multiple Instance Learning in Histopathology

Siemen Brussee, Pieter A. Valkema, Jurre A. J. Weijer, Thom Doeleman, Anne M. R. Schrader, Jesper Kers

TL;DR

The paper tackles the challenge of evaluating and optimizing end-to-end MIL pipelines for histopathology WSIs under weak supervision. It introduces PathBench-MIL, an open-source framework built atop SlideFlow that supports end-to-end MIL workflows, including preprocessing, tiling, feature extraction, aggregation, and AutoML-based pipeline search. Key contributions include an Optuna-based AutoML engine with budget-aware pruning, a unified YAML configuration for reproducibility, and an interactive Dash visualization tool for exploring results. The framework enables dataset-aware, systematic comparisons across MIL configurations and tasks (classification, regression, survival), aiming to standardize evaluation practices in pathology AI.

Abstract

We introduce PathBench-MIL, an open-source AutoML and benchmarking framework for multiple instance learning (MIL) in histopathology. The system automates end-to-end MIL pipeline construction, including preprocessing, feature extraction, and MIL-aggregation, and provides reproducible benchmarking of dozens of MIL models and feature extractors. PathBench-MIL integrates visualization tooling, a unified configuration system, and modular extensibility, enabling rapid experimentation and standardization across datasets and tasks. PathBench-MIL is publicly available at https://github.com/Sbrussee/PathBench-MIL

PathBench-MIL: A Comprehensive AutoML and Benchmarking Framework for Multiple Instance Learning in Histopathology

TL;DR

The paper tackles the challenge of evaluating and optimizing end-to-end MIL pipelines for histopathology WSIs under weak supervision. It introduces PathBench-MIL, an open-source framework built atop SlideFlow that supports end-to-end MIL workflows, including preprocessing, tiling, feature extraction, aggregation, and AutoML-based pipeline search. Key contributions include an Optuna-based AutoML engine with budget-aware pruning, a unified YAML configuration for reproducibility, and an interactive Dash visualization tool for exploring results. The framework enables dataset-aware, systematic comparisons across MIL configurations and tasks (classification, regression, survival), aiming to standardize evaluation practices in pathology AI.

Abstract

We introduce PathBench-MIL, an open-source AutoML and benchmarking framework for multiple instance learning (MIL) in histopathology. The system automates end-to-end MIL pipeline construction, including preprocessing, feature extraction, and MIL-aggregation, and provides reproducible benchmarking of dozens of MIL models and feature extractors. PathBench-MIL integrates visualization tooling, a unified configuration system, and modular extensibility, enabling rapid experimentation and standardization across datasets and tasks. PathBench-MIL is publicly available at https://github.com/Sbrussee/PathBench-MIL

Paper Structure

This paper contains 19 sections, 5 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Modular MIL pipeline illustrating the combinatorial search space of histopathology MIL frameworks. Variations in preprocessing, tiling, stain normalization, feature extractors (including foundation models), and aggregation strategies yield a large set of candidate pipelines that PathBench-MIL benchmarks and optimizes end-to-end.
  • Figure 2: Overview of the PathBench-MIL framework. (A) Required inputs: whole-slide images (WSIs), a configuration file specifying pipeline components, and slide-level annotations. (B) The end-to-end MIL pipeline, including quality control (QC), tile extraction, stain normalization, tile-level feature extraction, and slide-level aggregation to produce final predictions. (C) Optimization mode: an AutoML engine samples full pipeline configurations, evaluates each configuration, and stores results for interactive visualization. (D) Benchmarking mode: all user-specified pipeline combinations are enumerated and evaluated, enabling systematic comparison across MIL design choices.
  • Figure 3: Efficiency mechanisms in PathBench-MIL.(a) The AutoML optimization workflow. For each trial $t$ within the total budget $T$ ($t < T$), a pipeline configuration $c$ is sampled from the search space $SS$ ($c \sim SS$). The framework utilizes conditional logic (diamond nodes) to check for existing data artifacts; if tiles or features for configuration $c$ already exist, extraction is skipped. The model is then trained and evaluated. If the trial is not pruned (dashed line), results are saved. This loop repeats until the budget is exhausted ($t \ge T$) to identify the optimal configuration. (b) The Hyperband pruning strategy visualizes the successive halving of trials. After a warm-up phase (gray dashed lines), underperforming trials (red/orange) are terminated at specific checkpoints ("rungs"), reserving resources for the most promising candidates (blue).