Table of Contents
Fetching ...

An Ensemble-Based Two-Step Framework for Classification of Pap Smear Cell Images

Theo Di Piazza, Loic Boussel

TL;DR

This work tackles automated Pap smear image classification to aid cervical cancer screening by proposing a two-stage ensemble framework. The pipeline first detects diagnostically rubbish images and then classifies non-rubbish images as healthy or unhealthy (with the possibility of both) using multiple pretrained backbones (CNNs and Vision Transformers) and probability averaging across folds. Trained and evaluated on the APACC dataset with 5-fold cross-validation, the ensemble achieves superior macro-F1 and AUROC compared to individual models, demonstrating robust performance under class imbalance and image artifacts. The approach offers a practical, scalable tool to assist cytologists and motivates future enhancements through boosting or meta-learning for optimal model fusion.

Abstract

Early detection of cervical cancer is crucial for improving patient outcomes and reducing mortality by identifying precancerous lesions as soon as possible. As a result, the use of pap smear screening has significantly increased, leading to a growing demand for automated tools that can assist cytologists managing their rising workload. To address this, the Pap Smear Cell Classification Challenge (PS3C) has been organized in association with ISBI in 2025. This project aims to promote the development of automated tools for pap smear images classification. The analyzed images are grouped into four categories: healthy, unhealthy, both, and rubbish images which are considered as unsuitable for diagnosis. In this work, we propose a two-stage ensemble approach: first, a neural network determines whether an image is rubbish or not. If not, a second neural network classifies the image as containing a healthy cell, an unhealthy cell, or both.

An Ensemble-Based Two-Step Framework for Classification of Pap Smear Cell Images

TL;DR

This work tackles automated Pap smear image classification to aid cervical cancer screening by proposing a two-stage ensemble framework. The pipeline first detects diagnostically rubbish images and then classifies non-rubbish images as healthy or unhealthy (with the possibility of both) using multiple pretrained backbones (CNNs and Vision Transformers) and probability averaging across folds. Trained and evaluated on the APACC dataset with 5-fold cross-validation, the ensemble achieves superior macro-F1 and AUROC compared to individual models, demonstrating robust performance under class imbalance and image artifacts. The approach offers a practical, scalable tool to assist cytologists and motivates future enhancements through boosting or meta-learning for optimal model fusion.

Abstract

Early detection of cervical cancer is crucial for improving patient outcomes and reducing mortality by identifying precancerous lesions as soon as possible. As a result, the use of pap smear screening has significantly increased, leading to a growing demand for automated tools that can assist cytologists managing their rising workload. To address this, the Pap Smear Cell Classification Challenge (PS3C) has been organized in association with ISBI in 2025. This project aims to promote the development of automated tools for pap smear images classification. The analyzed images are grouped into four categories: healthy, unhealthy, both, and rubbish images which are considered as unsuitable for diagnosis. In this work, we propose a two-stage ensemble approach: first, a neural network determines whether an image is rubbish or not. If not, a second neural network classifies the image as containing a healthy cell, an unhealthy cell, or both.

Paper Structure

This paper contains 15 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Example of a cell for each class from the APACC dataset.
  • Figure 2: Frequency of classes in the train, validation and test sets from the APACC dataset.
  • Figure 3: Overview of the method. Step 1: Models are independently trained for binary classification to predict whether an image is rubbish or not. Final predictions are obtained by averaging the model scores. If the image is classified as non-rubbish, it proceeds to Step 2. Step 2: Models are separately trained for multi-label classification to determine whether the input image contains a healthy cell, an unhealthy cell, or both. Final predictions are computed as the average of model predictions.