Table of Contents
Fetching ...

CAMEL2: Enhancing weakly supervised learning for histopathology images by incorporating the significance ratio

Gang Xu, Shuhao Wang, Lingyu Zhao, Xiao Chen, Tongwei Wang, Lang Wang, Zhenwei Luo, Dahan Wang, Zewen Zhang, Aijun Liu, Wei Ba, Zhigang Song, Huaiyin Shi, Dingrong Zhong, Jianpeng Ma

TL;DR

The results with various datasets demonstrate that CAMEL2, with the help of 5120 × 5120 image‐level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance‐ and slide‐level classifications.

Abstract

Histopathology image analysis plays a crucial role in cancer diagnosis. However, training a clinically applicable segmentation algorithm requires pathologists to engage in labour-intensive labelling. In contrast, weakly supervised learning methods, which only require coarse-grained labels at the image level, can significantly reduce the labeling efforts. Unfortunately, while these methods perform reasonably well in slide-level prediction, their ability to locate cancerous regions, which is essential for many clinical applications, remains unsatisfactory. Previously, we proposed CAMEL, which achieves comparable results to those of fully supervised baselines in pixel-level segmentation. However, CAMEL requires 1,280x1,280 image-level binary annotations for positive WSIs. Here, we present CAMEL2, by introducing a threshold of the cancerous ratio for positive bags, it allows us to better utilize the information, consequently enabling us to scale up the image-level setting from 1,280x1,280 to 5,120x5,120 while maintaining the accuracy. Our results with various datasets, demonstrate that CAMEL2, with the help of 5,120x5,120 image-level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance- and slide-level classifications.

CAMEL2: Enhancing weakly supervised learning for histopathology images by incorporating the significance ratio

TL;DR

The results with various datasets demonstrate that CAMEL2, with the help of 5120 × 5120 image‐level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance‐ and slide‐level classifications.

Abstract

Histopathology image analysis plays a crucial role in cancer diagnosis. However, training a clinically applicable segmentation algorithm requires pathologists to engage in labour-intensive labelling. In contrast, weakly supervised learning methods, which only require coarse-grained labels at the image level, can significantly reduce the labeling efforts. Unfortunately, while these methods perform reasonably well in slide-level prediction, their ability to locate cancerous regions, which is essential for many clinical applications, remains unsatisfactory. Previously, we proposed CAMEL, which achieves comparable results to those of fully supervised baselines in pixel-level segmentation. However, CAMEL requires 1,280x1,280 image-level binary annotations for positive WSIs. Here, we present CAMEL2, by introducing a threshold of the cancerous ratio for positive bags, it allows us to better utilize the information, consequently enabling us to scale up the image-level setting from 1,280x1,280 to 5,120x5,120 while maintaining the accuracy. Our results with various datasets, demonstrate that CAMEL2, with the help of 5,120x5,120 image-level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance- and slide-level classifications.
Paper Structure (4 sections, 13 figures, 4 tables)

This paper contains 4 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: The framework of CAMEL2.a, Flowchart of data preparation for different variants of CAMEL2. Here, we use a WSI with 44,800$\times$43,008 pixels under 20$\times$ magnification in CAMELYON16 as an example. After splitting into 5,120$\times$5,120 image-level data and removing background, only 33 images remain for manual labelling. b, Training procedure of CAMEL2. 5,120$\times$5,120 image-level data are considered as bags and divided into 256 320$\times$320 instances. The instances with the top $t$% highest positive probability for positive WSIs are selected and labelled positive. In CAMEL2 and CAMEL2-ratio, the thresholds for positive bags are set to 20% and the ratio constraint specific to each bag, respectively.
  • Figure 2: Multiple instance MNIST task.a, Illustration of a binary classification experiment with the MNIST dataset. In this case, each bag contains 1,000 instances. A bag is labelled as positive if it contains at least one instance with a handwriting label of "0"; otherwise, it is labelled as negative. b, Binary classification results of CAMEL2 with the MNIST test set. Each group represents experiments with sizes of 1,000, 2,000, 5,000, 10,000, and 20,000, respectively. Ten experiments are conducted with the target label varying from "0" to "9". Since the number of positive instances in each positive bag is randomly sampled from a uniform distribution U(0, 1,000), the proportion of target instances in positive bags for each group ranges from 0-100%, 0-50%, 0-20%, 0-10%, and 0-5%, respectively.
  • Figure 3: Performance of different methods using CAMELYON16 test set WSIs with a relatively large ratio of cancerous regions.
  • Figure 4: Performance of different methods using CAMELYON16 test set WSIs with a relatively small ratio of cancerous regions.
  • Figure 5: Performance of different methods using the test set WSIs of the gastric cancer dataset.
  • ...and 8 more figures