Table of Contents
Fetching ...

Impact of imperfect annotations on CNN training and performance for instance segmentation and classification in digital pathology

Laura Gálvez Jiménez, Christine Decaestecker

TL;DR

This work investigates the conditions for determining an appropriate number of training epochs to prevent overfitting to annotation noise during training and indicates that the utilisation of a small, correctly annotated validation set is instrumental in avoiding overfitting and maintaining model performance to a large extent.

Abstract

Segmentation and classification of large numbers of instances, such as cell nuclei, are crucial tasks in digital pathology for accurate diagnosis. However, the availability of high-quality datasets for deep learning methods is often limited due to the complexity of the annotation process. In this work, we investigate the impact of noisy annotations on the training and performance of a state-of-the-art CNN model for the combined task of detecting, segmenting and classifying nuclei in histopathology images. In this context, we investigate the conditions for determining an appropriate number of training epochs to prevent overfitting to annotation noise during training. Our results indicate that the utilisation of a small, correctly annotated validation set is instrumental in avoiding overfitting and maintaining model performance to a large extent. Additionally, our findings underscore the beneficial role of pre-training.

Impact of imperfect annotations on CNN training and performance for instance segmentation and classification in digital pathology

TL;DR

This work investigates the conditions for determining an appropriate number of training epochs to prevent overfitting to annotation noise during training and indicates that the utilisation of a small, correctly annotated validation set is instrumental in avoiding overfitting and maintaining model performance to a large extent.

Abstract

Segmentation and classification of large numbers of instances, such as cell nuclei, are crucial tasks in digital pathology for accurate diagnosis. However, the availability of high-quality datasets for deep learning methods is often limited due to the complexity of the annotation process. In this work, we investigate the impact of noisy annotations on the training and performance of a state-of-the-art CNN model for the combined task of detecting, segmenting and classifying nuclei in histopathology images. In this context, we investigate the conditions for determining an appropriate number of training epochs to prevent overfitting to annotation noise during training. Our results indicate that the utilisation of a small, correctly annotated validation set is instrumental in avoiding overfitting and maintaining model performance to a large extent. Additionally, our findings underscore the beneficial role of pre-training.

Paper Structure

This paper contains 29 sections, 1 equation, 3 figures, 14 tables, 2 algorithms.

Figures (3)

  • Figure 1: (a-c) Examples of annotations and (d-f) their corruption introduced in the clean MoNuSAC training subset. The classification masks show epithelial cells in red, lymphocytes in blue and neutrophils in green.
  • Figure 2: Loss obtained when evaluating Algorithm \ref{['hover_training']} training stages
  • Figure A.1: Flowchart of the experiments done (see details in the main text).