Table of Contents
Fetching ...

OxML Challenge 2023: Carcinoma classification using data augmentation

Kislay Raj, Teerath Kumar, Alessandra Mileo, Malika Bendechache

TL;DR

This work tackles carcinoma classification in the OxML 2023 challenge under severe data scarcity and imbalance. It proposes a padding-based data augmentation strategy to handle variable image sizes and preserve features, complemented by an ensemble of five pretrained CNNs trained with full fine-tuning and SGD optimization. The approach achieves top-three placement among 39 teams, demonstrating robust generalization on a limited dataset and highlighting the value of simple, well-regularized augmentations combined with model ensembling. The study underscores the practical impact of padding and ensembling for medical image analysis when data privacy limits data availability, with potential benefits for early carcinoma detection.

Abstract

Carcinoma is the prevailing type of cancer and can manifest in various body parts. It is widespread and can potentially develop in numerous locations within the body. In the medical domain, data for carcinoma cancer is often limited or unavailable due to privacy concerns. Moreover, when available, it is highly imbalanced, with a scarcity of positive class samples and an abundance of negative ones. The OXML 2023 challenge provides a small and imbalanced dataset, presenting significant challenges for carcinoma classification. To tackle these issues, participants in the challenge have employed various approaches, relying on pre-trained models, preprocessing techniques, and few-shot learning. Our work proposes a novel technique that combines padding augmentation and ensembling to address the carcinoma classification challenge. In our proposed method, we utilize ensembles of five neural networks and implement padding as a data augmentation technique, taking into account varying image sizes to enhance the classifier's performance. Using our approach, we made place into top three and declared as winner.

OxML Challenge 2023: Carcinoma classification using data augmentation

TL;DR

This work tackles carcinoma classification in the OxML 2023 challenge under severe data scarcity and imbalance. It proposes a padding-based data augmentation strategy to handle variable image sizes and preserve features, complemented by an ensemble of five pretrained CNNs trained with full fine-tuning and SGD optimization. The approach achieves top-three placement among 39 teams, demonstrating robust generalization on a limited dataset and highlighting the value of simple, well-regularized augmentations combined with model ensembling. The study underscores the practical impact of padding and ensembling for medical image analysis when data privacy limits data availability, with potential benefits for early carcinoma detection.

Abstract

Carcinoma is the prevailing type of cancer and can manifest in various body parts. It is widespread and can potentially develop in numerous locations within the body. In the medical domain, data for carcinoma cancer is often limited or unavailable due to privacy concerns. Moreover, when available, it is highly imbalanced, with a scarcity of positive class samples and an abundance of negative ones. The OXML 2023 challenge provides a small and imbalanced dataset, presenting significant challenges for carcinoma classification. To tackle these issues, participants in the challenge have employed various approaches, relying on pre-trained models, preprocessing techniques, and few-shot learning. Our work proposes a novel technique that combines padding augmentation and ensembling to address the carcinoma classification challenge. In our proposed method, we utilize ensembles of five neural networks and implement padding as a data augmentation technique, taking into account varying image sizes to enhance the classifier's performance. Using our approach, we made place into top three and declared as winner.
Paper Structure (9 sections, 1 figure, 1 table)

This paper contains 9 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Training and testing using ensemble technique