HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training
Fenghe Tang, Ronghao Xu, Qingsong Yao, Xueming Fu, Quan Quan, Heqin Zhu, Zaiyi Liu, S. Kevin Zhou
TL;DR
This work tackles the challenge of pre-training large-scale medical image models without labels by enabling end-to-end pre-training of a CNN-Transformer hybrid. HySparK introduces bottom-up 3D hybrid masking and uses sparse convolution in the CNN stage with patch-based ViT encoding, plus a hierarchical decoder with skip connections to fuse multi-scale features. Extensive experiments on 13 public 3D CT datasets show HySparK achieves state-of-the-art transfer to BTCV and MSD segmentation tasks, outperforming MAE, SimMIM, SparK, and other baselines. The approach highlights strong multi-scale representations and transferability in medical image analysis, with code released for reproducibility.
Abstract
The generative self-supervised learning strategy exhibits remarkable learning representational capabilities. However, there is limited attention to end-to-end pre-training methods based on a hybrid architecture of CNN and Transformer, which can learn strong local and global representations simultaneously. To address this issue, we propose a generative pre-training strategy called Hybrid Sparse masKing (HySparK) based on masked image modeling and apply it to large-scale pre-training on medical images. First, we perform a bottom-up 3D hybrid masking strategy on the encoder to keep consistency masking. Then we utilize sparse convolution for the top CNNs and encode unmasked patches for the bottom vision Transformers. Second, we employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction. Third, we implement our pre-training method on a collection of multiple large-scale 3D medical imaging datasets. Extensive experiments indicate that our proposed pre-training strategy demonstrates robust transfer-ability in supervised downstream tasks and sheds light on HySparK's promising prospects. The code is available at https://github.com/FengheTan9/HySparK
