Table of Contents
Fetching ...

Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation

Mehmet Can Yavuz, Yang Yang

TL;DR

Bridging the gap between abundant 2D biomedical data and scarce 3D data, the work introduces Cross-D Conv, a Fourier-domain weight-rotation mechanism that transfers 2D priors to 3D kernels. It maintains a transferable 3D weight tensor $U_{3D}$ and a learnable rotation $r=(k_x,k_y,k_z, heta)$, applying a rotation $R( heta)$ in the frequency domain and a phase adjustment before inverse FFT to obtain rotated weights for cross-dimensional convolution. The approach yields superior or comparable feature quality across 2D and 3D datasets (e.g., IN1K and RadImageNet) with improved efficiency, demonstrated via experiments on ImageNet-like and multimodal volumetric tasks. This enables practical 2D pretraining benefits for 3D medical image analysis and points to hybrid training and optimized segmentation architectures as promising future directions.

Abstract

In biomedical imaging analysis, the dichotomy between 2D and 3D data presents a significant challenge. While 3D volumes offer superior real-world applicability, they are less available for each modality and not easy to train in large scale, whereas 2D samples are abundant but less comprehensive. This paper introduces Cross-D Conv operation, a novel approach that bridges the dimensional gap by learning the phase shifting in the Fourier domain. Our method enables seamless weight transfer between 2D and 3D convolution operations, effectively facilitating cross-dimensional learning. The proposed architecture leverages the abundance of 2D training data to enhance 3D model performance, offering a practical solution to the multimodal data scarcity challenge in 3D medical model pretraining. Experimental validation on the RadImagenet (2D) and multimodal volumetric sets demonstrates that our approach achieves comparable or superior performance in feature quality assessment. The enhanced convolution operation presents new opportunities for developing efficient classification and segmentation models in medical imaging. This work represents an advancement in cross-dimensional and multimodal medical image analysis, offering a robust framework for utilizing 2D priors in 3D model pretraining while maintaining computational efficiency of 2D training. The code is available on https://github.com/convergedmachine/Cross-D-Conv.

Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation

TL;DR

Bridging the gap between abundant 2D biomedical data and scarce 3D data, the work introduces Cross-D Conv, a Fourier-domain weight-rotation mechanism that transfers 2D priors to 3D kernels. It maintains a transferable 3D weight tensor and a learnable rotation , applying a rotation in the frequency domain and a phase adjustment before inverse FFT to obtain rotated weights for cross-dimensional convolution. The approach yields superior or comparable feature quality across 2D and 3D datasets (e.g., IN1K and RadImageNet) with improved efficiency, demonstrated via experiments on ImageNet-like and multimodal volumetric tasks. This enables practical 2D pretraining benefits for 3D medical image analysis and points to hybrid training and optimized segmentation architectures as promising future directions.

Abstract

In biomedical imaging analysis, the dichotomy between 2D and 3D data presents a significant challenge. While 3D volumes offer superior real-world applicability, they are less available for each modality and not easy to train in large scale, whereas 2D samples are abundant but less comprehensive. This paper introduces Cross-D Conv operation, a novel approach that bridges the dimensional gap by learning the phase shifting in the Fourier domain. Our method enables seamless weight transfer between 2D and 3D convolution operations, effectively facilitating cross-dimensional learning. The proposed architecture leverages the abundance of 2D training data to enhance 3D model performance, offering a practical solution to the multimodal data scarcity challenge in 3D medical model pretraining. Experimental validation on the RadImagenet (2D) and multimodal volumetric sets demonstrates that our approach achieves comparable or superior performance in feature quality assessment. The enhanced convolution operation presents new opportunities for developing efficient classification and segmentation models in medical imaging. This work represents an advancement in cross-dimensional and multimodal medical image analysis, offering a robust framework for utilizing 2D priors in 3D model pretraining while maintaining computational efficiency of 2D training. The code is available on https://github.com/convergedmachine/Cross-D-Conv.

Paper Structure

This paper contains 8 sections, 3 equations, 1 figure, 5 tables, 1 algorithm.

Figures (1)

  • Figure 1: Architectural diagram of the Cross-D Conv operation workflow. The process transforms 2D input tensors through: (1) rotation parameter generation from spatial coordinates, (2) Fourier transform and phase shifting, and (3) projection of 3D convolutional weights onto 2D kernels. Green blocks indicate trainable parameters, while orange blocks represent I/O tensors.