Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation
Mehmet Can Yavuz, Yang Yang
TL;DR
Bridging the gap between abundant 2D biomedical data and scarce 3D data, the work introduces Cross-D Conv, a Fourier-domain weight-rotation mechanism that transfers 2D priors to 3D kernels. It maintains a transferable 3D weight tensor $U_{3D}$ and a learnable rotation $r=(k_x,k_y,k_z, heta)$, applying a rotation $R( heta)$ in the frequency domain and a phase adjustment before inverse FFT to obtain rotated weights for cross-dimensional convolution. The approach yields superior or comparable feature quality across 2D and 3D datasets (e.g., IN1K and RadImageNet) with improved efficiency, demonstrated via experiments on ImageNet-like and multimodal volumetric tasks. This enables practical 2D pretraining benefits for 3D medical image analysis and points to hybrid training and optimized segmentation architectures as promising future directions.
Abstract
In biomedical imaging analysis, the dichotomy between 2D and 3D data presents a significant challenge. While 3D volumes offer superior real-world applicability, they are less available for each modality and not easy to train in large scale, whereas 2D samples are abundant but less comprehensive. This paper introduces Cross-D Conv operation, a novel approach that bridges the dimensional gap by learning the phase shifting in the Fourier domain. Our method enables seamless weight transfer between 2D and 3D convolution operations, effectively facilitating cross-dimensional learning. The proposed architecture leverages the abundance of 2D training data to enhance 3D model performance, offering a practical solution to the multimodal data scarcity challenge in 3D medical model pretraining. Experimental validation on the RadImagenet (2D) and multimodal volumetric sets demonstrates that our approach achieves comparable or superior performance in feature quality assessment. The enhanced convolution operation presents new opportunities for developing efficient classification and segmentation models in medical imaging. This work represents an advancement in cross-dimensional and multimodal medical image analysis, offering a robust framework for utilizing 2D priors in 3D model pretraining while maintaining computational efficiency of 2D training. The code is available on https://github.com/convergedmachine/Cross-D-Conv.
