Table of Contents
Fetching ...

Exploring learning environments for label\-efficient cancer diagnosis

Samta Rani, Tanvir Ahmad, Sarfaraz Masood, Chandni Saxena

TL;DR

The paper tackles label-efficient cancer diagnosis by comparing supervised, semi-supervised, and self-supervised learning across three histopathology datasets (breast, lung, kidney) using three pretrained backbones. It introduces a seven-training-set framework (TS1–TS7) with varying labeled/unlabeled ratios and leverages pseudo-labeling for Semi-SL and SimCLR-based contrastive learning for Self-SL. Across experiments, Semi-SL consistently approximates supervised performance, while EfficientNetB0 delivers the best overall accuracy, and Self-SL lags yet remains informative. The findings support practical deployment of label-efficient strategies in medical imaging, enabling effective cancer prediction with limited labeled data and offering generalizability across modalities and cancer types.

Abstract

Despite significant research efforts and advancements, cancer remains a leading cause of mortality. Early cancer prediction has become a crucial focus in cancer research to streamline patient care and improve treatment outcomes. Manual tumor detection by histopathologists can be time consuming, prompting the need for computerized methods to expedite treatment planning. Traditional approaches to tumor detection rely on supervised learning, necessitates a large amount of annotated data for model training. However, acquiring such extensive labeled data can be laborious and time\-intensive. This research examines the three learning environments: supervised learning (SL), semi\-supervised learning (Semi\-SL), and self\-supervised learning (Self\-SL): to predict kidney, lung, and breast cancer. Three pre\-trained deep learning models (Residual Network\-50, Visual Geometry Group\-16, and EfficientNetB0) are evaluated based on these learning settings using seven carefully curated training sets. To create the first training set (TS1), SL is applied to all annotated image samples. Five training sets (TS2\-TS6) with different ratios of labeled and unlabeled cancer images are used to evaluateSemi\-SL. Unlabeled cancer images from the final training set (TS7) are utilized for Self\-SL assessment. Among different learning environments, outcomes from the Semi\-SL setting show a strong degree of agreement with the outcomes achieved in the SL setting. The uniform pattern of observations from the pre\-trained models across all three datasets validates the methodology and techniques of the research. Based on modest number of labeled samples and minimal computing cost, our study suggests that the Semi\-SL option can be a highly viable replacement for the SL option under label annotation constraint scenarios.

Exploring learning environments for label\-efficient cancer diagnosis

TL;DR

The paper tackles label-efficient cancer diagnosis by comparing supervised, semi-supervised, and self-supervised learning across three histopathology datasets (breast, lung, kidney) using three pretrained backbones. It introduces a seven-training-set framework (TS1–TS7) with varying labeled/unlabeled ratios and leverages pseudo-labeling for Semi-SL and SimCLR-based contrastive learning for Self-SL. Across experiments, Semi-SL consistently approximates supervised performance, while EfficientNetB0 delivers the best overall accuracy, and Self-SL lags yet remains informative. The findings support practical deployment of label-efficient strategies in medical imaging, enabling effective cancer prediction with limited labeled data and offering generalizability across modalities and cancer types.

Abstract

Despite significant research efforts and advancements, cancer remains a leading cause of mortality. Early cancer prediction has become a crucial focus in cancer research to streamline patient care and improve treatment outcomes. Manual tumor detection by histopathologists can be time consuming, prompting the need for computerized methods to expedite treatment planning. Traditional approaches to tumor detection rely on supervised learning, necessitates a large amount of annotated data for model training. However, acquiring such extensive labeled data can be laborious and time\-intensive. This research examines the three learning environments: supervised learning (SL), semi\-supervised learning (Semi\-SL), and self\-supervised learning (Self\-SL): to predict kidney, lung, and breast cancer. Three pre\-trained deep learning models (Residual Network\-50, Visual Geometry Group\-16, and EfficientNetB0) are evaluated based on these learning settings using seven carefully curated training sets. To create the first training set (TS1), SL is applied to all annotated image samples. Five training sets (TS2\-TS6) with different ratios of labeled and unlabeled cancer images are used to evaluateSemi\-SL. Unlabeled cancer images from the final training set (TS7) are utilized for Self\-SL assessment. Among different learning environments, outcomes from the Semi\-SL setting show a strong degree of agreement with the outcomes achieved in the SL setting. The uniform pattern of observations from the pre\-trained models across all three datasets validates the methodology and techniques of the research. Based on modest number of labeled samples and minimal computing cost, our study suggests that the Semi\-SL option can be a highly viable replacement for the SL option under label annotation constraint scenarios.
Paper Structure (16 sections, 4 equations, 8 figures, 8 tables)

This paper contains 16 sections, 4 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Process flow generalization for supervised learning (SL), semi-supervised learning (Semi-SL), and self-supervised learning (Self-SL)
  • Figure 2: Workflow for exploring three learning settings (Setting 1, Setting 2, and Setting 3) and deploying models on three datasets for predicting breast cancer, lung cancer, and kidney cancer
  • Figure 3: Process flow followed for the curation of seven training sets (TS1 to TS7) based on three learning settings
  • Figure 4: Sample distribution of seven training sets with labeled (L) and unlabeled (U) sample ratios for breast cancer, lung cancer, and kidney cancer datasets where B: Benign, M: Malignant
  • Figure 5: Accuracy graph for breast cancer dataset
  • ...and 3 more figures