FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation
Zhenghua Li, Hang Chen, Zihao Sun, Kai Li, Xiaolin Hu
TL;DR
FGNet addresses EM neuron segmentation by transferring representations from the natural-image pre-trained SAM2 to the EM domain. It introduces a Feature-Guided Attention (FGA) module to steer a lightweight Fine-Grained Encoder (FGE), allowing the model to recover fine structural details that SAM2 may miss due to downsampling. The framework uses two affinity decoders to produce a coarse ($A_s$) and a refined ($A_r$) affinity map, with final segmentation obtained via watershed on $A_r$; results show competitive performance with SAM2 frozen and significant gains after EM-domain fine-tuning, surpassing prior SOTA methods. This cross-domain transfer, combined with targeted domain-adaptive guidance, offers a practical route for high-precision 3D EM neuron segmentation with limited EM annotations.
Abstract
Accurate segmentation of neural structures in Electron Microscopy (EM) images is paramount for neuroscience. However, this task is challenged by intricate morphologies, low signal-to-noise ratios, and scarce annotations, limiting the accuracy and generalization of existing methods. To address these challenges, we seek to leverage the priors learned by visual foundation models on a vast amount of natural images to better tackle this task. Specifically, we propose a novel framework that can effectively transfer knowledge from Segment Anything 2 (SAM2), which is pre-trained on natural images, to the EM domain. We first use SAM2 to extract powerful, general-purpose features. To bridge the domain gap, we introduce a Feature-Guided Attention module that leverages semantic cues from SAM2 to guide a lightweight encoder, the Fine-Grained Encoder (FGE), in focusing on these challenging regions. Finally, a dual-affinity decoder generates both coarse and refined affinity maps. Experimental results demonstrate that our method achieves performance comparable to state-of-the-art (SOTA) approaches with the SAM2 weights frozen. Upon further fine-tuning on EM data, our method significantly outperforms existing SOTA methods. This study validates that transferring representations pre-trained on natural images, when combined with targeted domain-adaptive guidance, can effectively address the specific challenges in neuron segmentation.
