Table of Contents
Fetching ...

Self-Supervised ImageNet Representations for In Vivo Confocal Microscopy: Tortuosity Grading without Segmentation Maps

Kim Ouan, Noémie Moreau, Katarzyna Bozek

Abstract

The tortuosity of corneal nerve fibers are used as indication for different diseases. Current state-of-the-art methods for grading the tortuosity heavily rely on expensive segmentation maps of these nerve fibers. In this paper, we demonstrate that self-supervised pretrained features from ImageNet are transferable to the domain of in vivo confocal microscopy. We show that DINO should not be disregarded as a deep learning model for medical imaging, although it was superseded by two later versions. After careful fine-tuning, DINO improves upon the state-of-the-art in terms of accuracy (84,25%) and sensitivity (77,97%). Our fine-tuned model focuses on the key morphological elements in grading without the use of segmentation maps.

Self-Supervised ImageNet Representations for In Vivo Confocal Microscopy: Tortuosity Grading without Segmentation Maps

Abstract

The tortuosity of corneal nerve fibers are used as indication for different diseases. Current state-of-the-art methods for grading the tortuosity heavily rely on expensive segmentation maps of these nerve fibers. In this paper, we demonstrate that self-supervised pretrained features from ImageNet are transferable to the domain of in vivo confocal microscopy. We show that DINO should not be disregarded as a deep learning model for medical imaging, although it was superseded by two later versions. After careful fine-tuning, DINO improves upon the state-of-the-art in terms of accuracy (84,25%) and sensitivity (77,97%). Our fine-tuned model focuses on the key morphological elements in grading without the use of segmentation maps.
Paper Structure (5 sections, 2 figures, 4 tables)

This paper contains 5 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: (a) Example images of the $\mathrm{CORN}^{1500}$ dataset. Ordered from the mildest level 1 (left) to the most severe level 4 (right). (b) UMAP visualization of $\mathrm{CORN}^{1500}$ features extracted by the frozen DINO backbone • level 1, • level 2, • level 3, • level 4.
  • Figure 2: (a) Attention masks over CORN-3 IVCM images of level 1 (left) and level 4 (right) for 60% of the attention mass of the last layer of the encoder. (b) The confusion matrix of our fine-tuned model shows only misclassifications in adjacent levels.