Table of Contents
Fetching ...

Deep Bayesian Active Learning-to-Rank with Relative Annotation for Estimation of Ulcerative Colitis Severity

Takeaki Kadota, Hideaki Hayashi, Ryoma Bise, Kiyohito Tanaka, Seiichi Uchida

TL;DR

This work tackles the high cost of annotating disease severity in ulcerative colitis by leveraging relative annotations within a deep Bayesian active learning-to-rank framework. A Siamese CNN trained with MC dropout provides rank scores and uncertainty estimates, enabling uncertainty-guided selection of informative image pairs for labeling. The authors provide a theoretical basis for applying MC dropout to pairwise ranking and demonstrate improved relative-severity estimation and multi-class classification on UC endoscopic datasets, with better handling of class imbalance and reduced annotation effort. The results indicate substantial annotation savings and robust performance, suggesting broad applicability of Bayesian active learning-to-rank for medical image ranking and severity estimation.

Abstract

Automatic image-based severity estimation is an important task in computer-aided diagnosis. Severity estimation by deep learning requires a large amount of training data to achieve a high performance. In general, severity estimation uses training data annotated with discrete (i.e., quantized) severity labels. Annotating discrete labels is often difficult in images with ambiguous severity, and the annotation cost is high. In contrast, relative annotation, in which the severity between a pair of images is compared, can avoid quantizing severity and thus makes it easier. We can estimate relative disease severity using a learning-to-rank framework with relative annotations, but relative annotation has the problem of the enormous number of pairs that can be annotated. Therefore, the selection of appropriate pairs is essential for relative annotation. In this paper, we propose a deep Bayesian active learning-to-rank that automatically selects appropriate pairs for relative annotation. Our method preferentially annotates unlabeled pairs with high learning efficiency from the model uncertainty of the samples. We prove the theoretical basis for adapting Bayesian neural networks to pairwise learning-to-rank and demonstrate the efficiency of our method through experiments on endoscopic images of ulcerative colitis on both private and public datasets. We also show that our method achieves a high performance under conditions of significant class imbalance because it automatically selects samples from the minority classes.

Deep Bayesian Active Learning-to-Rank with Relative Annotation for Estimation of Ulcerative Colitis Severity

TL;DR

This work tackles the high cost of annotating disease severity in ulcerative colitis by leveraging relative annotations within a deep Bayesian active learning-to-rank framework. A Siamese CNN trained with MC dropout provides rank scores and uncertainty estimates, enabling uncertainty-guided selection of informative image pairs for labeling. The authors provide a theoretical basis for applying MC dropout to pairwise ranking and demonstrate improved relative-severity estimation and multi-class classification on UC endoscopic datasets, with better handling of class imbalance and reduced annotation effort. The results indicate substantial annotation savings and robust performance, suggesting broad applicability of Bayesian active learning-to-rank for medical image ranking and severity estimation.

Abstract

Automatic image-based severity estimation is an important task in computer-aided diagnosis. Severity estimation by deep learning requires a large amount of training data to achieve a high performance. In general, severity estimation uses training data annotated with discrete (i.e., quantized) severity labels. Annotating discrete labels is often difficult in images with ambiguous severity, and the annotation cost is high. In contrast, relative annotation, in which the severity between a pair of images is compared, can avoid quantizing severity and thus makes it easier. We can estimate relative disease severity using a learning-to-rank framework with relative annotations, but relative annotation has the problem of the enormous number of pairs that can be annotated. Therefore, the selection of appropriate pairs is essential for relative annotation. In this paper, we propose a deep Bayesian active learning-to-rank that automatically selects appropriate pairs for relative annotation. Our method preferentially annotates unlabeled pairs with high learning efficiency from the model uncertainty of the samples. We prove the theoretical basis for adapting Bayesian neural networks to pairwise learning-to-rank and demonstrate the efficiency of our method through experiments on endoscopic images of ulcerative colitis on both private and public datasets. We also show that our method achieves a high performance under conditions of significant class imbalance because it automatically selects samples from the minority classes.
Paper Structure (35 sections, 14 equations, 8 figures, 3 tables)

This paper contains 35 sections, 14 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Absolute and relative annotations.
  • Figure 2: (a) Deep Bayesian active learning-to-rank for relative severity estimation; step 1 (green arrows): generating a small number of pairs using randomly selected images from an unlabeled image set and annotating these pairs for the initial training; step 2 (red arrow): training the Bayesian CNN using the labeled image pair set; step 3 (blue arrows): selecting high-uncertainty images from the unlabeled image set to create pairs and attaching relative labels to the pairs. (b) Multi-task learning for severity classification.
  • Figure 3: Examples of endoscopic images of ulcerative colitis at each Mayo (severity).
  • Figure 4: Accuracy of relative label estimates for baseline (blue), Core-set (orange), proposed w/o UBS (green), and proposed method (red) at each labeling ratio. The black dotted line indicates the result of baseline (all data).
  • Figure 5: Box plots of estimated rank scores at each Mayo score. The initial labeling ratio was measured with $20\%$ (iteration $K=0$). The results of the baseline and the proposed method estimates were measured with a labeling ratio of $50\%$ (iteration $K=6$). The estimate is considered reasonable if there is little overlap in the distribution of rank scores for each Mayo score.
  • ...and 3 more figures