Table of Contents
Fetching ...

Every Error has Its Magnitude: Asymmetric Mistake Severity Training for Multiclass Multiple Instance Learning

Sungrae Hong, Jiwon Jeong, Jisu Shin, Donghee Han, Sol Lee, Kyungeun Kim, Mun Yong Yi

Abstract

Multiple Instance Learning (MIL) has emerged as a promising paradigm for Whole Slide Image (WSI) diagnosis, offering effective learning with limited annotations. However, existing MIL frameworks overlook diagnostic priorities and fail to differentiate the severity of misclassifications in multiclass, leaving clinically critical errors unaddressed. We propose a mistake-severity-aware training strategy that organizes diagnostic classes into a hierarchical structure, with each level optimized using a severity-weighted cross-entropy loss that penalizes high-severity misclassifications more strongly. Additionally, hierarchical consistency is enforced through probabilistic alignment, a semantic feature remix applied to the instance bag to robustly train class priority and accommodate clinical cases involving multiple symptoms. An asymmetric Mikel's Wheel-based metric is also introduced to quantify the severity of errors specific to medical fields. Experiments on challenging public and real-world in-house datasets demonstrate that our approach significantly mitigates critical errors in MIL diagnosis compared to existing methods. We present additional experimental results on natural domain data to demonstrate the generalizability of our proposed method beyond medical contexts.

Every Error has Its Magnitude: Asymmetric Mistake Severity Training for Multiclass Multiple Instance Learning

Abstract

Multiple Instance Learning (MIL) has emerged as a promising paradigm for Whole Slide Image (WSI) diagnosis, offering effective learning with limited annotations. However, existing MIL frameworks overlook diagnostic priorities and fail to differentiate the severity of misclassifications in multiclass, leaving clinically critical errors unaddressed. We propose a mistake-severity-aware training strategy that organizes diagnostic classes into a hierarchical structure, with each level optimized using a severity-weighted cross-entropy loss that penalizes high-severity misclassifications more strongly. Additionally, hierarchical consistency is enforced through probabilistic alignment, a semantic feature remix applied to the instance bag to robustly train class priority and accommodate clinical cases involving multiple symptoms. An asymmetric Mikel's Wheel-based metric is also introduced to quantify the severity of errors specific to medical fields. Experiments on challenging public and real-world in-house datasets demonstrate that our approach significantly mitigates critical errors in MIL diagnosis compared to existing methods. We present additional experimental results on natural domain data to demonstrate the generalizability of our proposed method beyond medical contexts.
Paper Structure (32 sections, 11 equations, 15 figures, 7 tables, 1 algorithm)

This paper contains 32 sections, 11 equations, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: The label characteristics that multiclass WSI have and the risks of applying MIL in the medical domain without accounting for mistake severity. (a) While every object in the natural domain is typically labeled, WSI only assigns the most urgent diagnosis among all observed complex findings. (b) Let $\prec$ denotes more urgent diagnosis. Despite the model MIL-A being considered a traditionally "better" model than MIL-B because of high accuracy, it exhibits a higher number of severe misclassifications, which are shown in colored squares. This comparison highlights that conventional MIL approaches can lead to unreliable assessments for medical applications.
  • Figure 2: Structured class relationships from the finest hierarchy $\mathcal{H}$ to the root $\mathcal{R}$.
  • Figure 3: Conceptual Descriptions of SFR. Dashed figures represent unique symptoms in $Y_a$, where all figures are unlabeled and unknown. SFR extracts target cases and generates synthetic samples using only the available label $Y_a$.
  • Figure 4: Two types of Mikel's Wheel, where $\bullet$ is the true class and $\blacktriangleright$ is the predicted class. (a) In a symmetric wheel, the distance between two classes $i$ and $j$ is the absolute difference. (b) Proposed asymmetric wheel imposes a penalty $P$ for misclassifying a high-priority true class.
  • Figure 5: Data class hierarchy diagram of In-house and BRACS brancati2022bracs dataset. The In-house is presented in an italic font, whereas BRACS is underlined. We denote the class priority in each hierarchy using $\prec$ and $\equiv$.
  • ...and 10 more figures