Table of Contents
Fetching ...

Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells

Vinh Quoc Luu, Duy Khanh Le, Huy Thanh Nguyen, Minh Thanh Nguyen, Thinh Tien Nguyen, Vinh Quang Dinh

TL;DR

The paper tackles the lack of large labeled white blood cell (WBC) segmentation datasets by proposing a semi-supervised self-training framework that integrates FixMatch for consistency regularization. It adopts a two-stage ST/ST++ pipeline that leverages unlabeled data through pseudo-labeling and selective re-training, enabling end-to-end learning with both supervised and unsupervised losses. On Zheng1, Zheng2, and LISC datasets, the approach achieves its best performance with DeepLab-V3+ and ResNet-50, achieving $90.69\%$, $87.37\%$, and $76.49\%$ respectively. However, applying FixMatch during the supervised stage can reduce labeled-data accuracy, and pseudo-masks are less reliable on complex WBC images, indicating a need for domain-specific refinements. Overall, the method demonstrates the potential of semi-supervised learning for WBC segmentation and shows generalizability across datasets, albeit with limitations tied to intra- and inter-image variability.

Abstract

Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first challenge, a semi-supervised learning framework should be devised to efficiently capitalize on the scarcity of the dataset available. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. Self-training is a technique that utilizes the model trained on labeled data to generate pseudo-labels for the unlabeled data and then re-train on both of them. FixMatch is a consistency-regularization algorithm to enforce the model's robustness against variations in the input image. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.

Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells

TL;DR

The paper tackles the lack of large labeled white blood cell (WBC) segmentation datasets by proposing a semi-supervised self-training framework that integrates FixMatch for consistency regularization. It adopts a two-stage ST/ST++ pipeline that leverages unlabeled data through pseudo-labeling and selective re-training, enabling end-to-end learning with both supervised and unsupervised losses. On Zheng1, Zheng2, and LISC datasets, the approach achieves its best performance with DeepLab-V3+ and ResNet-50, achieving , , and respectively. However, applying FixMatch during the supervised stage can reduce labeled-data accuracy, and pseudo-masks are less reliable on complex WBC images, indicating a need for domain-specific refinements. Overall, the method demonstrates the potential of semi-supervised learning for WBC segmentation and shows generalizability across datasets, albeit with limitations tied to intra- and inter-image variability.

Abstract

Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first challenge, a semi-supervised learning framework should be devised to efficiently capitalize on the scarcity of the dataset available. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. Self-training is a technique that utilizes the model trained on labeled data to generate pseudo-labels for the unlabeled data and then re-train on both of them. FixMatch is a consistency-regularization algorithm to enforce the model's robustness against variations in the input image. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.
Paper Structure (16 sections, 4 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Our proposed semi-supervised semantic segmentation framework. Consistency regularization is incorporated in the first stage of training to boost the capabilities of performing towards unlabeled images with weak-to-strong mechanisms. This incorporation can be done in both the ST and ST++ framework. In this diagram, ST++ is taken as an example
  • Figure 2: Sample of Ground truth mask, Non-FixMatch, and FixMatch mask generated from the dataset Zheng 1.
  • Figure 3: Deviations from the Ground truth mask of Non-FixMatch and FixMatch mask generated from the LISC dataset.
  • Figure 4: LISC's variability between its images leads to the model's significantly reduced performances