Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images
Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai
TL;DR
The paper addresses data scarcity in 3D neuron segmentation by leveraging cross-domain priors from large-scale 2D natural images. It introduces a training paradigm that pre-trains a 2D Vision Transformer on natural images (via DINO) and transfers weights to a 3D ViT using a tailored 2D-to-3D strategy, enabling data-efficient learning for 3D neuron slice segmentation. Two transfer variants are explored (Average and Center), with Center delivering the best performance and depth information providing additional gains, culminating in an approximately 8.71% improvement on the BigNeuron benchmark with the same data. This approach demonstrates that natural image priors can effectively boost data-limited neuroimaging tasks without extra inference overhead, aiding robust 3D neuron morphology reconstruction.
Abstract
Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single neuron reconstruction. To address this limitation, we aim to distill the consensus knowledge from massive natural image data to aid the segmentation model in learning the complex neuron structures. Specifically, in this work, we propose a novel training paradigm that leverages a 2D Vision Transformer model pre-trained on large-scale natural images to initialize our Transformer-based 3D neuron segmentation model with a tailored 2D-to-3D weight transferring strategy. Our method builds a knowledge sharing connection between the abundant natural and the scarce neuron image domains to improve the 3D neuron segmentation ability in a data-efficiency manner. Evaluated on a popular benchmark, BigNeuron, our method enhances neuron segmentation performance by 8.71% over the model trained from scratch with the same amount of training samples.
