Table of Contents
Fetching ...

Topologically Regularized Multiple Instance Learning to Harness Data Scarcity

Salome Kazeminia, Carsten Marr, Bastian Rieck

TL;DR

Topological regularization term to MIL provides a shape-preserving inductive bias that compels the encoder to maintain the essential geometrical-topological structure of input bags during projection into latent space, which enhances the performance and generalization of the MIL classifier regardless of the aggregation function, particularly for scarce training data.

Abstract

In biomedical data analysis, Multiple Instance Learning (MIL) models have emerged as a powerful tool to classify patients' microscopy samples. However, the data-intensive requirement of these models poses a significant challenge in scenarios with scarce data availability, e.g., in rare diseases. We introduce a topological regularization term to MIL to mitigate this challenge. It provides a shape-preserving inductive bias that compels the encoder to maintain the essential geometrical-topological structure of input bags during projection into latent space. This enhances the performance and generalization of the MIL classifier regardless of the aggregation function, particularly for scarce training data. The effectiveness of our method is confirmed through experiments across a range of datasets, showing an average enhancement of 2.8% for MIL benchmarks, 15.3% for synthetic MIL datasets, and 5.5% for real-world biomedical datasets over the current state-of-the-art.

Topologically Regularized Multiple Instance Learning to Harness Data Scarcity

TL;DR

Topological regularization term to MIL provides a shape-preserving inductive bias that compels the encoder to maintain the essential geometrical-topological structure of input bags during projection into latent space, which enhances the performance and generalization of the MIL classifier regardless of the aggregation function, particularly for scarce training data.

Abstract

In biomedical data analysis, Multiple Instance Learning (MIL) models have emerged as a powerful tool to classify patients' microscopy samples. However, the data-intensive requirement of these models poses a significant challenge in scenarios with scarce data availability, e.g., in rare diseases. We introduce a topological regularization term to MIL to mitigate this challenge. It provides a shape-preserving inductive bias that compels the encoder to maintain the essential geometrical-topological structure of input bags during projection into latent space. This enhances the performance and generalization of the MIL classifier regardless of the aggregation function, particularly for scarce training data. The effectiveness of our method is confirmed through experiments across a range of datasets, showing an average enhancement of 2.8% for MIL benchmarks, 15.3% for synthetic MIL datasets, and 5.5% for real-world biomedical datasets over the current state-of-the-art.
Paper Structure (23 sections, 15 equations, 8 figures, 6 tables)

This paper contains 23 sections, 15 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Topologically Regularized Multiple Instance Learning (TR-MIL): We calculate the distance matrix of input instances $x_i$ inside each bag $X_{b_{m}}$. Subsequently, we apply persistent homology based on the Vietoris-Rips complex, by treating each bag of $n$ instances as a point cloud. We employ the same process for the latent feature vectors of each bag. Generating shape descriptors (persistence diagrams) for both the latent space and the image space representations of the bag, we calculate a topological regularization loss ($L_\mathrm{topo}$) and combine it with the standard MIL loss ($L_\mathrm{class}$).
  • Figure 2: TR-RGMIL preserves the topology of toy instances sampled from a hypersphere when projecting them to the 2D latent (a). It leads to a more distinguished latent representation of bags (b) and $30\%$ higher classification accuracy compared to RGMIL.
  • Figure 3: TR-MIL outperforms MIL models irrespective of the aggregation function when subjected to scarce training data. For each number of training bags, the average and standard deviation for the F1-score in $5$ runs over bag sizes of $10$, $50$, and $100$ (in total $15$ runs) is reported.
  • Figure 4: Topological regularization enhances the RGMIL model generalizability for scarce training data. Each column shows learning curves of models trained with $10$ bags, each containing $10$ instances on average.
  • Figure 5: Topological regularization enhances the model's ability to identify disease-relevant cells more effectively. TR-MIL Anomaly results in more uniform anomaly scores for deformed cells, in contrast to the varied scores resulting from MIL Anomaly.
  • ...and 3 more figures