Table of Contents
Fetching ...

Extreme Learning Machines for Attention-based Multiple Instance Learning in Whole-Slide Image Classification

Rajiv Krishnakumar, Julien Baglio, Frederik F. Flöther, Christian Ruiz, Stefan Habringer, Nicole H. Romano

TL;DR

This work systematically analyzes how attention-architecture choices in attention-based MIL affect performance on biomedical imagery, and introduces an attention-based extreme MIL model that fixes most parameters to greatly reduce training effort while maintaining competitive accuracy. It demonstrates that nonlinear attention and higher-dimensional feature transformations markedly improve robustness and detection of rare cells, with domain-specific pre-processing further boosting performance. The Extreme MIL framework achieves comparable results to deep MIL with substantially fewer trainable parameters, suggesting practical benefits for clinical diagnostics and enabling scalable computation. The paper also outlines future directions, including quantum extensions (QELMs) and multi-head attention, to further enhance accuracy and efficiency in single-cell and slide-level classification.

Abstract

Whole-slide image classification represents a key challenge in computational pathology and medicine. Attention-based multiple instance learning (MIL) has emerged as an effective approach for this problem. However, the effect of attention mechanism architecture on model performance is not well-documented for biomedical imagery. In this work, we compare different methods and implementations of MIL, including deep learning variants. We introduce a new method using higher-dimensional feature spaces for deep MIL. We also develop a novel algorithm for whole-slide image classification where extreme machine learning is combined with attention-based MIL to improve sensitivity and reduce training complexity. We apply our algorithms to the problem of detecting circulating rare cells (CRCs), such as erythroblasts, in peripheral blood. Our results indicate that nonlinearities play a key role in the classification, as removing them leads to a sharp decrease in stability in addition to a decrease in average area under the curve (AUC) of over 4%. We also demonstrate a considerable increase in robustness of the model with improvements of over 10% in average AUC when higher-dimensional feature spaces are leveraged. In addition, we show that extreme learning machines can offer clear improvements in terms of training efficiency by reducing the number of trained parameters by a factor of 5 whilst still maintaining the average AUC to within 1.5% of the deep MIL model. Finally, we discuss options of enriching the classical computing framework with quantum algorithms in the future. This work can thus help pave the way towards more accurate and efficient single-cell diagnostics, one of the building blocks of precision medicine.

Extreme Learning Machines for Attention-based Multiple Instance Learning in Whole-Slide Image Classification

TL;DR

This work systematically analyzes how attention-architecture choices in attention-based MIL affect performance on biomedical imagery, and introduces an attention-based extreme MIL model that fixes most parameters to greatly reduce training effort while maintaining competitive accuracy. It demonstrates that nonlinear attention and higher-dimensional feature transformations markedly improve robustness and detection of rare cells, with domain-specific pre-processing further boosting performance. The Extreme MIL framework achieves comparable results to deep MIL with substantially fewer trainable parameters, suggesting practical benefits for clinical diagnostics and enabling scalable computation. The paper also outlines future directions, including quantum extensions (QELMs) and multi-head attention, to further enhance accuracy and efficiency in single-cell and slide-level classification.

Abstract

Whole-slide image classification represents a key challenge in computational pathology and medicine. Attention-based multiple instance learning (MIL) has emerged as an effective approach for this problem. However, the effect of attention mechanism architecture on model performance is not well-documented for biomedical imagery. In this work, we compare different methods and implementations of MIL, including deep learning variants. We introduce a new method using higher-dimensional feature spaces for deep MIL. We also develop a novel algorithm for whole-slide image classification where extreme machine learning is combined with attention-based MIL to improve sensitivity and reduce training complexity. We apply our algorithms to the problem of detecting circulating rare cells (CRCs), such as erythroblasts, in peripheral blood. Our results indicate that nonlinearities play a key role in the classification, as removing them leads to a sharp decrease in stability in addition to a decrease in average area under the curve (AUC) of over 4%. We also demonstrate a considerable increase in robustness of the model with improvements of over 10% in average AUC when higher-dimensional feature spaces are leveraged. In addition, we show that extreme learning machines can offer clear improvements in terms of training efficiency by reducing the number of trained parameters by a factor of 5 whilst still maintaining the average AUC to within 1.5% of the deep MIL model. Finally, we discuss options of enriching the classical computing framework with quantum algorithms in the future. This work can thus help pave the way towards more accurate and efficient single-cell diagnostics, one of the building blocks of precision medicine.

Paper Structure

This paper contains 25 sections, 9 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: a) The pipeline of going from whole slide image to classification and b) a comparison of the attention mechanism architectures investigated. When detecting rare cell types, the performance of a multiple instance learning (MIL) model relies heavily on the pooling operation. Ideally the signal from even a single cell of interest is propagated, regardless of the bag size.
  • Figure 2: The Blood-MNIST training dataset. The top figure shows a sample of some of the individual training image data along with their class labels. The bottom figure is the histogram of the training dataset class labels.
  • Figure 3: Plot showing the first and second t-SNE components of the individually pre-processed training feature vectors after they have gone through a TSNE transformation. The pre-processing consists of passing each individual cell image through a ResNET-18 network pre-trained on ImageNet data (top) and pre-trained on the BloodMNIST data (bottom) and then taking the 8 first principle components of the output vectors.
  • Figure 4: Plot showing the cumulative variance explained by principal components derived from the ResNet-18 feature vectors (trained on ImageNet) of the Blood-MNIST images. The values and their standard deviations (indicated by the barely visible error bars) have been computed by taking the principle components using 20 different random permutations of the training dataset. A dotted line has been drawn at 0.8 to guide the eye.
  • Figure 5: Comparison of deep and gated deep MIL models to the logistic regression baseline model. The accuracy, sensitivity, specificity, and AUC are measured for models trained on either natural imagery ("generic") or domain-specific ("specialized") data. The lines and bands for all models represent the calculated means and standard deviations respectively over 20 different random permutations of the train-validation split.
  • ...and 9 more figures