Table of Contents
Fetching ...

Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai

TL;DR

The paper addresses data scarcity in 3D neuron segmentation by leveraging cross-domain priors from large-scale 2D natural images. It introduces a training paradigm that pre-trains a 2D Vision Transformer on natural images (via DINO) and transfers weights to a 3D ViT using a tailored 2D-to-3D strategy, enabling data-efficient learning for 3D neuron slice segmentation. Two transfer variants are explored (Average and Center), with Center delivering the best performance and depth information providing additional gains, culminating in an approximately 8.71% improvement on the BigNeuron benchmark with the same data. This approach demonstrates that natural image priors can effectively boost data-limited neuroimaging tasks without extra inference overhead, aiding robust 3D neuron morphology reconstruction.

Abstract

Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single neuron reconstruction. To address this limitation, we aim to distill the consensus knowledge from massive natural image data to aid the segmentation model in learning the complex neuron structures. Specifically, in this work, we propose a novel training paradigm that leverages a 2D Vision Transformer model pre-trained on large-scale natural images to initialize our Transformer-based 3D neuron segmentation model with a tailored 2D-to-3D weight transferring strategy. Our method builds a knowledge sharing connection between the abundant natural and the scarce neuron image domains to improve the 3D neuron segmentation ability in a data-efficiency manner. Evaluated on a popular benchmark, BigNeuron, our method enhances neuron segmentation performance by 8.71% over the model trained from scratch with the same amount of training samples.

Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

TL;DR

The paper addresses data scarcity in 3D neuron segmentation by leveraging cross-domain priors from large-scale 2D natural images. It introduces a training paradigm that pre-trains a 2D Vision Transformer on natural images (via DINO) and transfers weights to a 3D ViT using a tailored 2D-to-3D strategy, enabling data-efficient learning for 3D neuron slice segmentation. Two transfer variants are explored (Average and Center), with Center delivering the best performance and depth information providing additional gains, culminating in an approximately 8.71% improvement on the BigNeuron benchmark with the same data. This approach demonstrates that natural image priors can effectively boost data-limited neuroimaging tasks without extra inference overhead, aiding robust 3D neuron morphology reconstruction.

Abstract

Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single neuron reconstruction. To address this limitation, we aim to distill the consensus knowledge from massive natural image data to aid the segmentation model in learning the complex neuron structures. Specifically, in this work, we propose a novel training paradigm that leverages a 2D Vision Transformer model pre-trained on large-scale natural images to initialize our Transformer-based 3D neuron segmentation model with a tailored 2D-to-3D weight transferring strategy. Our method builds a knowledge sharing connection between the abundant natural and the scarce neuron image domains to improve the 3D neuron segmentation ability in a data-efficiency manner. Evaluated on a popular benchmark, BigNeuron, our method enhances neuron segmentation performance by 8.71% over the model trained from scratch with the same amount of training samples.
Paper Structure (5 sections, 1 figure, 1 table)

This paper contains 5 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: The overview of our proposed training paradigm for 3D neuron segmentation. The network follows an encoder-decoder structure. The 3D neuron image is first divided into several 3D blocks which are then fed to a 3D Vision Transformer (ViT) for slice segmentation. During training phase, the pre-trained weights from a 2D ViT are used to initialize the 3D ViT through a weight transferring strategy. In the end, the segmented slices are stacked together to form the final segmentation prediction which is then forwarded to produce the target SWC file through a neuron tracing method.