Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation
José Morano, Guilherme Aresta, Dmitrii Lachinov, Julia Mai, Ursula Schmidt-Erfurth, Hrvoje Bogunović
TL;DR
The paper tackles label-efficient 3D→2D segmentation in OCT by introducing a full-volume CNN with a 3D encoder and 2D decoder linked through novel 3D→2D feature projection blocks (FPB). It pairs this architecture with a self-supervised pretraining scheme that reconstructs cross-dimensional modality pairs (e.g., OCT to SLO/FAF) to learn robust representations without labels. On GA and RPD segmentation tasks, the approach outperforms state-of-the-art methods in low-data settings, achieving up to a 23% Dice gain with SSL and at least an 8% gain without SSL, with FAF-based SSL often providing higher gains while SLO-based SSL offers registration-free benefits. The findings suggest broad applicability of the SSL paradigm to other 3D→2D tasks and multi-modal medical imaging domains, enabling more data-efficient deployment in clinical workflows.
Abstract
Deep learning has become a valuable tool for the automation of certain medical image segmentation tasks, significantly relieving the workload of medical specialists. Some of these tasks require segmentation to be performed on a subset of the input dimensions, the most common case being 3D-to-2D. However, the performance of existing methods is strongly conditioned by the amount of labeled data available, as there is currently no data efficient method, e.g. transfer learning, that has been validated on these tasks. In this work, we propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. The CNN is composed of a 3D encoder and a 2D decoder connected by novel 3D-to-2D blocks. The SSL method consists of reconstructing image pairs of modalities with different dimensionality. The approach has been validated in two tasks with clinical relevance: the en-face segmentation of geographic atrophy and reticular pseudodrusen in optical coherence tomography. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score. Moreover, the proposed SSL method allows further improvement of this performance by up to 23%, and we show that the SSL is beneficial regardless of the network architecture.
