Table of Contents
Fetching ...

Msmsfnet: a multi-stream and multi-scale fusion net for edge detection

Chenguang Liu, Chisheng Wang, Feifei Dong, Xiayang Xiao, Xin Su, Chuanhua Zhu, Dejin Zhang, Qingquan Li

TL;DR

This work addresses edge detection without relying on ImageNet pre-training by introducing msmsfnet, a multi-stream, multi-scale fusion network that enhances representation richness through parallel branches and spatially asymmetric convolutions. The model uses deep supervision with side outputs and a fused final map, trained with a class-balanced objective to handle edge-vs-non-edge imbalance. Empirical results show msmsfnet achieves state-of-the-art performance on BIPEDv2, BSDS500, and NYUDv2 when trained from scratch, and remains competitive when pre-trained weights are used, with SAR experiments illustrating robust performance in challenging noisy imagery. The findings highlight the importance of training data scale and architectural design that emphasizes multi-scale fusion, suggesting new directions for edge detection research in data-constrained domains. Overall, the approach demonstrates that strong edge-detection performance is achievable without large-scale pre-training, while still benefiting from pre-training when available.

Abstract

Edge detection is a long-standing problem in computer vision. Despite the efficiency of existing algorithms, their performance, however, rely heavily on the pre-trained weights of the backbone network on the ImageNet dataset. The use of pre-trained weights in previous methods significantly increases the difficulty to design new models for edge detection without relying on existing well-trained ImageNet models, as pre-training the model on the ImageNet dataset is expensive and becomes compulsory to ensure the fairness of comparison. Besides, the pre-training and fine-tuning strategy is not always useful and sometimes even inaccessible. For instance, the pre-trained weights on the ImageNet dataset are unlikely to be helpful for edge detection in Synthetic Aperture Radar (SAR) images due to strong differences in the statistics between optical images and SAR images. Moreover, no dataset has comparable size to the ImageNet dataset for SAR image processing. In this work, we study the performance achievable by state-of-the-art deep learning based edge detectors in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi-scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch, our model outperforms state-of-the-art edge detectors in three publicly available datasets. We also demonstrate the efficiency of our model for edge detection in SAR images, where no useful pre-trained weight is available. Finally, We show that our model is able to achieve competitive performance on the BSDS500 dataset when the pre-trained weights are used.

Msmsfnet: a multi-stream and multi-scale fusion net for edge detection

TL;DR

This work addresses edge detection without relying on ImageNet pre-training by introducing msmsfnet, a multi-stream, multi-scale fusion network that enhances representation richness through parallel branches and spatially asymmetric convolutions. The model uses deep supervision with side outputs and a fused final map, trained with a class-balanced objective to handle edge-vs-non-edge imbalance. Empirical results show msmsfnet achieves state-of-the-art performance on BIPEDv2, BSDS500, and NYUDv2 when trained from scratch, and remains competitive when pre-trained weights are used, with SAR experiments illustrating robust performance in challenging noisy imagery. The findings highlight the importance of training data scale and architectural design that emphasizes multi-scale fusion, suggesting new directions for edge detection research in data-constrained domains. Overall, the approach demonstrates that strong edge-detection performance is achievable without large-scale pre-training, while still benefiting from pre-training when available.

Abstract

Edge detection is a long-standing problem in computer vision. Despite the efficiency of existing algorithms, their performance, however, rely heavily on the pre-trained weights of the backbone network on the ImageNet dataset. The use of pre-trained weights in previous methods significantly increases the difficulty to design new models for edge detection without relying on existing well-trained ImageNet models, as pre-training the model on the ImageNet dataset is expensive and becomes compulsory to ensure the fairness of comparison. Besides, the pre-training and fine-tuning strategy is not always useful and sometimes even inaccessible. For instance, the pre-trained weights on the ImageNet dataset are unlikely to be helpful for edge detection in Synthetic Aperture Radar (SAR) images due to strong differences in the statistics between optical images and SAR images. Moreover, no dataset has comparable size to the ImageNet dataset for SAR image processing. In this work, we study the performance achievable by state-of-the-art deep learning based edge detectors in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi-scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch, our model outperforms state-of-the-art edge detectors in three publicly available datasets. We also demonstrate the efficiency of our model for edge detection in SAR images, where no useful pre-trained weight is available. Finally, We show that our model is able to achieve competitive performance on the BSDS500 dataset when the pre-trained weights are used.
Paper Structure (19 sections, 5 equations, 10 figures, 5 tables)

This paper contains 19 sections, 5 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: The proposed msmsfblock. The expression in the parenthesis indicates the number of output filters of the layer.
  • Figure 2: Network architecture of the proposed msmsfnet. The number in the parenthesis indicates the number of output filters of the layer.
  • Figure 3: The precision-recall curves computed by different methods in the test set of the BIPEDv2 dataset.
  • Figure 4: Edge detection results computed by different edge detectors in a test image of BIPEDv2 dataset.
  • Figure 5: The precision-recall curves computed by different methods in the test set of BSDS500 dataset.
  • ...and 5 more figures