Table of Contents
Fetching ...

FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation

Zhenghua Li, Hang Chen, Zihao Sun, Kai Li, Xiaolin Hu

TL;DR

FGNet addresses EM neuron segmentation by transferring representations from the natural-image pre-trained SAM2 to the EM domain. It introduces a Feature-Guided Attention (FGA) module to steer a lightweight Fine-Grained Encoder (FGE), allowing the model to recover fine structural details that SAM2 may miss due to downsampling. The framework uses two affinity decoders to produce a coarse ($A_s$) and a refined ($A_r$) affinity map, with final segmentation obtained via watershed on $A_r$; results show competitive performance with SAM2 frozen and significant gains after EM-domain fine-tuning, surpassing prior SOTA methods. This cross-domain transfer, combined with targeted domain-adaptive guidance, offers a practical route for high-precision 3D EM neuron segmentation with limited EM annotations.

Abstract

Accurate segmentation of neural structures in Electron Microscopy (EM) images is paramount for neuroscience. However, this task is challenged by intricate morphologies, low signal-to-noise ratios, and scarce annotations, limiting the accuracy and generalization of existing methods. To address these challenges, we seek to leverage the priors learned by visual foundation models on a vast amount of natural images to better tackle this task. Specifically, we propose a novel framework that can effectively transfer knowledge from Segment Anything 2 (SAM2), which is pre-trained on natural images, to the EM domain. We first use SAM2 to extract powerful, general-purpose features. To bridge the domain gap, we introduce a Feature-Guided Attention module that leverages semantic cues from SAM2 to guide a lightweight encoder, the Fine-Grained Encoder (FGE), in focusing on these challenging regions. Finally, a dual-affinity decoder generates both coarse and refined affinity maps. Experimental results demonstrate that our method achieves performance comparable to state-of-the-art (SOTA) approaches with the SAM2 weights frozen. Upon further fine-tuning on EM data, our method significantly outperforms existing SOTA methods. This study validates that transferring representations pre-trained on natural images, when combined with targeted domain-adaptive guidance, can effectively address the specific challenges in neuron segmentation.

FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation

TL;DR

FGNet addresses EM neuron segmentation by transferring representations from the natural-image pre-trained SAM2 to the EM domain. It introduces a Feature-Guided Attention (FGA) module to steer a lightweight Fine-Grained Encoder (FGE), allowing the model to recover fine structural details that SAM2 may miss due to downsampling. The framework uses two affinity decoders to produce a coarse () and a refined () affinity map, with final segmentation obtained via watershed on ; results show competitive performance with SAM2 frozen and significant gains after EM-domain fine-tuning, surpassing prior SOTA methods. This cross-domain transfer, combined with targeted domain-adaptive guidance, offers a practical route for high-precision 3D EM neuron segmentation with limited EM annotations.

Abstract

Accurate segmentation of neural structures in Electron Microscopy (EM) images is paramount for neuroscience. However, this task is challenged by intricate morphologies, low signal-to-noise ratios, and scarce annotations, limiting the accuracy and generalization of existing methods. To address these challenges, we seek to leverage the priors learned by visual foundation models on a vast amount of natural images to better tackle this task. Specifically, we propose a novel framework that can effectively transfer knowledge from Segment Anything 2 (SAM2), which is pre-trained on natural images, to the EM domain. We first use SAM2 to extract powerful, general-purpose features. To bridge the domain gap, we introduce a Feature-Guided Attention module that leverages semantic cues from SAM2 to guide a lightweight encoder, the Fine-Grained Encoder (FGE), in focusing on these challenging regions. Finally, a dual-affinity decoder generates both coarse and refined affinity maps. Experimental results demonstrate that our method achieves performance comparable to state-of-the-art (SOTA) approaches with the SAM2 weights frozen. Upon further fine-tuning on EM data, our method significantly outperforms existing SOTA methods. This study validates that transferring representations pre-trained on natural images, when combined with targeted domain-adaptive guidance, can effectively address the specific challenges in neuron segmentation.

Paper Structure

This paper contains 22 sections, 4 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Network architecture. (a) Four core components: SAM2 Encoder, Feature-Guided Attention (FGA), Fine-Grained Encoder (FGE), and a pair of Affinity Decoders. Modules of different types are represented with distinct colors. The left column indicates the size of each feature in the horizontal dimension. (b) Detailed structure of the FGA module. The plus sign ($\oplus$) represents element-wise addition, and the multiplication sign ($\otimes$) represents element-wise multiplication. (c) Detailed structure of the DW Block.
  • Figure 2: 2D visualization of segmentation results obtained from different methods. For each result, the first row presents a 2D slice and the second row shows two zoomed-in regions. Best viewed in digital with zoom-in.
  • Figure 3: 3D visualization of segmentation results obtained from different methods. The 3D neuron morphology is shown, with red arrows indicating some of the slender dendrites in the neuron.
  • Figure 4: Hierarchical attention visualization for 3D EM neuron segmentation. Rows show sequential 2D slices. Columns 1–2 display input EM volumes and ground truth segmentations.Columns 3–5 display attention maps ($a_1$, $a_2$, $a_3$) overlaid on EM data, with warmer jet colormap colors indicating higher attention weights. A vertical color bar quantifies attention intensity, highlighting critical structural details like neuron boundaries.