Table of Contents
Fetching ...

Ordinal Multiple-instance Learning for Ulcerative Colitis Severity Estimation with Selective Aggregated Transformer

Kaito Shiku, Kazuya Nishimura, Daiki Suehiro, Kiyohito Tanaka, Ryoma Bise

TL;DR

A patient-level severity estimation method by a transformer with selective aggregator tokens, where a severity label is estimated from multiple images taken from a patient, similar to a clinical setting, which facilitates improving the discriminative ability between adjacent severity classes is proposed.

Abstract

Patient-level diagnosis of severity in ulcerative colitis (UC) is common in real clinical settings, where the most severe score in a patient is recorded. However, previous UC classification methods (i.e., image-level estimation) mainly assumed the input was a single image. Thus, these methods can not utilize severity labels recorded in real clinical settings. In this paper, we propose a patient-level severity estimation method by a transformer with selective aggregator tokens, where a severity label is estimated from multiple images taken from a patient, similar to a clinical setting. Our method can effectively aggregate features of severe parts from a set of images captured in each patient, and it facilitates improving the discriminative ability between adjacent severity classes. Experiments demonstrate the effectiveness of the proposed method on two datasets compared with the state-of-the-art MIL methods. Moreover, we evaluated our method in real clinical settings and confirmed that our method outperformed the previous image-level methods. The code is publicly available at https://github.com/Shiku-Kaito/Ordinal-Multiple-instance-Learning-for-Ulcerative-Colitis-Severity-Estimation.

Ordinal Multiple-instance Learning for Ulcerative Colitis Severity Estimation with Selective Aggregated Transformer

TL;DR

A patient-level severity estimation method by a transformer with selective aggregator tokens, where a severity label is estimated from multiple images taken from a patient, similar to a clinical setting, which facilitates improving the discriminative ability between adjacent severity classes is proposed.

Abstract

Patient-level diagnosis of severity in ulcerative colitis (UC) is common in real clinical settings, where the most severe score in a patient is recorded. However, previous UC classification methods (i.e., image-level estimation) mainly assumed the input was a single image. Thus, these methods can not utilize severity labels recorded in real clinical settings. In this paper, we propose a patient-level severity estimation method by a transformer with selective aggregator tokens, where a severity label is estimated from multiple images taken from a patient, similar to a clinical setting. Our method can effectively aggregate features of severe parts from a set of images captured in each patient, and it facilitates improving the discriminative ability between adjacent severity classes. Experiments demonstrate the effectiveness of the proposed method on two datasets compared with the state-of-the-art MIL methods. Moreover, we evaluated our method in real clinical settings and confirmed that our method outperformed the previous image-level methods. The code is publicly available at https://github.com/Shiku-Kaito/Ordinal-Multiple-instance-Learning-for-Ulcerative-Colitis-Severity-Estimation.

Paper Structure

This paper contains 11 sections, 3 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Patient-level diagnosis of severity in ulcerative colitis (UC). In real clinical settings, approximately 20 to 40 endoscopic images are captured from each patient, and the severity score for the most severe area of the images is recorded as the patient-level diagnosis. In this process, the severity score for each image is not recorded.
  • Figure 2: Ideal feature aggregation for max severity estimation in ordinal MIL. The patient-level severity is defined based on the label of the most severe instance within the bag. In this setting, the ideal aggregation involves aggregating features from the instances with the highest severity label within a bag.
  • Figure 3: Illustration of the effect of our selective aggregation. A previous single token may give attention for many instances, not only the severe ones, resulting in un-discriminative bag-level features. In our approach, each selective aggregator token corresponds to a different severity level, with the $k$-th token aggregating instance features that satisfy $\{Y^i>k\}$. Each token can effectively aggregate the severe instance features for each severity level of bags, leading to the production of discriminative bag-level features to distinguish between adjacent severity classes.
  • Figure 4: Overview of Selective Aggregated Transformer. First, instance features $\bm{e}_j^i$ are extracted from each image. Then, the features are aggregated by selective aggregator tokens to obtain the bag-level features $\bm{a}_1^i,...,\bm{a}_{K-1}^i$, which focus on each class boundary. Finally, rank predictions are obtained by applying binary classifiers for each aggregated feature.
  • Figure 5: Selective aggregation. The $k$-th selective aggregator token $\tilde{\bm{t}}_j^{i}$ aggregates the instance features $\tilde{\bm{e}}_j^{i}$ to extract the discriminative bag-level feature. To effectively discriminate $Y^i>{k}$, instance features above $k$ should be aggregated while ignoring instances less than or equal to $k$. Therefore, training makes attention to severe instances above $k$ higher (red), and otherwise lower (white).
  • ...and 4 more figures