Table of Contents
Fetching ...

MS2Edge: Towards Energy-Efficient and Crisp Edge Detection with Multi-Scale Residual Learning in SNNs

Yimeng Fan, Changsong Liu, Mingyang Li, Yuzhou Dai, Yanyan Liu, Wei Zhang

TL;DR

This paper tackles the dual challenges of energy-inefficiency and edge crispness in traditional ANN-based edge detectors by introducing MS2Edge, the first SNN-based edge detector. It combines a multi-scale residual spiking backbone (MS2ResNet) with Membrane Deformed Shortcuts and I-LIF neurons, a Spiking Multi-Scale Upsample Block (SMSUB) for detail reconstruction, and Membrane Average Decoding (MAD) for robust multi-time-step outputs, all trained from scratch via surrogate gradients. The model achieves state-of-the-art results on BSDS500, NYUDv2, BIPED, PLDU, and PLDM without pre-trained backbones, while maintaining ultralow energy consumption due to the spike-driven computation and quantization priors of SNNs. This work demonstrates that SNNs can deliver both high-quality, crisp edge maps and substantial energy savings, paving the way for efficient edge detection in neuromorphic and real-time settings, with potential extensions to other low-level vision tasks. The energy model scales with time steps and firing activity as $E_{SNNs}=T\times\left(fr\times E_{ACs}\times \eta_{ACs}+E_{MACs}\times \eta_{MACs}\right)$, illustrating the practical impact of temporal dynamics on efficiency while achieving strong edge fidelity.

Abstract

Edge detection with Artificial Neural Networks (ANNs) has achieved remarkable prog\-ress but faces two major challenges. First, it requires pre-training on large-scale extra data and complex designs for prior knowledge, leading to high energy consumption. Second, the predicted edges perform poorly in crispness and heavily rely on post-processing. Spiking Neural Networks (SNNs), as third generation neural networks, feature quantization and spike-driven computation mechanisms. They inherently provide a strong prior for edge detection in an energy-efficient manner, while its quantization mechanism helps suppress texture artifact interference around true edges, improving prediction crispness. However, the resulting quantization error inevitably introduces sparse edge discontinuities, compromising further enhancement of crispness. To address these challenges, we propose MS2Edge, the first SNN-based model for edge detection. At its core, we build a novel spiking backbone named MS2ResNet that integrates multi-scale residual learning to recover missing boundary lines and generate crisp edges, while combining I-LIF neurons with Membrane-based Deformed Shortcut (MDS) to mitigate quantization errors. The model is complemented by a Spiking Multi-Scale Upsample Block (SMSUB) for detail reconstruction during upsampling and a Membrane Average Decoding (MAD) method for effective integration of edge maps across multiple time steps. Experimental results demonstrate that MS2Edge outperforms ANN-based methods and achieves state-of-the-art performance on the BSDS500, NYUDv2, BIPED, PLDU, and PLDM datasets without pre-trained backbones, while maintaining ultralow energy consumption and generating crisp edge maps without post-processing.

MS2Edge: Towards Energy-Efficient and Crisp Edge Detection with Multi-Scale Residual Learning in SNNs

TL;DR

This paper tackles the dual challenges of energy-inefficiency and edge crispness in traditional ANN-based edge detectors by introducing MS2Edge, the first SNN-based edge detector. It combines a multi-scale residual spiking backbone (MS2ResNet) with Membrane Deformed Shortcuts and I-LIF neurons, a Spiking Multi-Scale Upsample Block (SMSUB) for detail reconstruction, and Membrane Average Decoding (MAD) for robust multi-time-step outputs, all trained from scratch via surrogate gradients. The model achieves state-of-the-art results on BSDS500, NYUDv2, BIPED, PLDU, and PLDM without pre-trained backbones, while maintaining ultralow energy consumption due to the spike-driven computation and quantization priors of SNNs. This work demonstrates that SNNs can deliver both high-quality, crisp edge maps and substantial energy savings, paving the way for efficient edge detection in neuromorphic and real-time settings, with potential extensions to other low-level vision tasks. The energy model scales with time steps and firing activity as , illustrating the practical impact of temporal dynamics on efficiency while achieving strong edge fidelity.

Abstract

Edge detection with Artificial Neural Networks (ANNs) has achieved remarkable prog\-ress but faces two major challenges. First, it requires pre-training on large-scale extra data and complex designs for prior knowledge, leading to high energy consumption. Second, the predicted edges perform poorly in crispness and heavily rely on post-processing. Spiking Neural Networks (SNNs), as third generation neural networks, feature quantization and spike-driven computation mechanisms. They inherently provide a strong prior for edge detection in an energy-efficient manner, while its quantization mechanism helps suppress texture artifact interference around true edges, improving prediction crispness. However, the resulting quantization error inevitably introduces sparse edge discontinuities, compromising further enhancement of crispness. To address these challenges, we propose MS2Edge, the first SNN-based model for edge detection. At its core, we build a novel spiking backbone named MS2ResNet that integrates multi-scale residual learning to recover missing boundary lines and generate crisp edges, while combining I-LIF neurons with Membrane-based Deformed Shortcut (MDS) to mitigate quantization errors. The model is complemented by a Spiking Multi-Scale Upsample Block (SMSUB) for detail reconstruction during upsampling and a Membrane Average Decoding (MAD) method for effective integration of edge maps across multiple time steps. Experimental results demonstrate that MS2Edge outperforms ANN-based methods and achieves state-of-the-art performance on the BSDS500, NYUDv2, BIPED, PLDU, and PLDM datasets without pre-trained backbones, while maintaining ultralow energy consumption and generating crisp edge maps without post-processing.

Paper Structure

This paper contains 37 sections, 33 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: Visualization of the quantization mechanism in SNNs and the crispness of the proposed MS2Edge. The images are normalized to the range of 0 to 1 and then directly fed into I-LIF spiking neuron to demonstrate the quantization mechanism of SNNs. For the I-LIF neuron, T is set to 1 and D to 4.
  • Figure 2: The quantization error for I-LIF and LIF neurons when processing identical inputs.
  • Figure 3: The architecture of proposed MS2Edge. It consists of the backbone, bottleneck module, skip module, decoder, and prediction module. The backbone uses MS2ResNet26 configuration, detailed in Appendix B. The MS2ResNet consists of MS2Block1 and MS2Block2, with the latter using $3\times3$ convolution in the MDS for downsampling to better incorporate spatial information. In the decoder, we use SMSUB for upsampling and fuse it with backbone features that are processed through the Skip Block. Finally, the Prediction Block is used for prediction.
  • Figure 4: The visualization of gradient norm in MS2ResNet.
  • Figure 5: The architecture of SMSUB, where d represents the dilation rate.
  • ...and 8 more figures