Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Fei Gao; Siwen Wang; Fandong Zhang; Hong-Yu Zhou; Yizhou Wang; Churan Wang; Gang Yu; Yizhou Yu

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Fei Gao, Siwen Wang, Fandong Zhang, Hong-Yu Zhou, Yizhou Wang, Churan Wang, Gang Yu, Yizhou Yu

TL;DR

Medical image analysis suffers from data scarcity, especially in 3D modalities. We propose CDSSL-P3D, a cross-dimensional self-supervised framework that uses a pseudo-3D transformation to jointly pre-train on 2D and 3D data, enabling SSL without changing network architecture. By transforming 2D images into pseudo-3D representations and training a 3D encoder with a PCRLv2-based objective that preserves pixels, semantics, and scales, the method leverages large 2D datasets to boost 3D tasks. Experiments across 13 downstream tasks show CDSSL-P3D achieves state-of-the-art performance and notable gains over single-dimension SSL baselines, with architecture-agnostic compatibility across CNNs and Transformers, demonstrating the practical value of cross-dimensional learning for medical imaging.

Abstract

Medical image analysis suffers from a shortage of data, whether annotated or not. This becomes even more pronounced when it comes to 3D medical images. Self-Supervised Learning (SSL) can partially ease this situation by using unlabeled data. However, most existing SSL methods can only make use of data in a single dimensionality (e.g. 2D or 3D), and are incapable of enlarging the training dataset by using data with differing dimensionalities jointly. In this paper, we propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D), that can leverage both 2D and 3D data for joint pre-training. Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data. This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis. We run extensive experiments on 13 downstream tasks, including 2D and 3D classification and segmentation. The results indicate that our CDSSL-P3D achieves superior performance, outperforming other advanced SSL methods.

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 2 figures, 6 tables)

This paper contains 21 sections, 3 equations, 2 figures, 6 tables.

Introduction
Method
Preliminary: image-to-column transformation (im2col)
Pseudo-3D transformation based on im2col
Notation:
Learning Objective
Network
Experiments
Datasets
Pre-training datasets.
Downstream datasets.
Experimental Details
Pre-training setup.
Downstream training setup.
Results
...and 6 more sections

Figures (2)

Figure 1: The overall CDSSL-P3D framework. In the pre-training stage, 2D images are converted to pseudo-3D images. Then, SSL is performed on the joint pseudo-3D and true 3D data. During the fine-tuning stage, this pre-trained 3D model is primarily used for downstream 3D tasks. As an additional benefit, downstream 2D classification tasks can be supported, and images in such 2D tasks go through our pseudo-3D transformation before fed into the 3D model.
Figure 2: The proposed pseudo-3D transformation inspired by im2col for MCMK problem. (a) Detailed depiction of im2col. Input image $\mathcal{I}$ and convolution kernel $\mathcal{K}$ are first unrolled into matrices $\widehat{\mathcal{I}}$ and $\widehat{\mathcal{K}}$, which are then multiplied to obtain the output. (b) Pseudo-3D transformation. Inspired by the transformation of $\widehat{\mathcal{I}}$, every instance of a sliding window over the entire 2D image $X^{2d}_i$ is unrolled to obtain the pseudo-3D image $X^{p3d}_i$.

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

TL;DR

Abstract

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)