Table of Contents
Fetching ...

Compact Twice Fusion Network for Edge Detection

Yachuan Li, Zongmin Li, Xavier Soria P., Chaozhi Yang, Qian Xiao, Yun Bai, Hua Li, Xiangdong Wang

TL;DR

CTFN tackles edge detection by integrating multi-scale features through two lightweight fusion modules and a dynamic focal loss to handle hard samples. The Semantic Enhancement Module expands the receptive field of fine-scale features, while the Pseudo Pixel-level Weighting module flexibly fuses multi-scale maps with decoupled channel and spatial weights. Dynamic Focal Loss reshapes cross-entropy and adapts weights over training to emphasize hard samples, improving edge localization and reducing texture interference. Evaluations on BSDS500, NYUDv2, and BIPEDv2 show competitive accuracy with far fewer parameters and lower computational cost, making CTFN well-suited for practical edge-detection tasks.

Abstract

The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.

Compact Twice Fusion Network for Edge Detection

TL;DR

CTFN tackles edge detection by integrating multi-scale features through two lightweight fusion modules and a dynamic focal loss to handle hard samples. The Semantic Enhancement Module expands the receptive field of fine-scale features, while the Pseudo Pixel-level Weighting module flexibly fuses multi-scale maps with decoupled channel and spatial weights. Dynamic Focal Loss reshapes cross-entropy and adapts weights over training to emphasize hard samples, improving edge localization and reducing texture interference. Evaluations on BSDS500, NYUDv2, and BIPEDv2 show competitive accuracy with far fewer parameters and lower computational cost, making CTFN well-suited for practical edge-detection tasks.

Abstract

The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.
Paper Structure (19 sections, 7 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 7 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the hard samples. The locations of the hard samples are marked with boxes. (a) an image from the BSDS500 dataset, (b) The result predicted by our method, (c) the corresponding ground truth.
  • Figure 2: Comparison of common structures in edge detection. (a) HED-based method, (b) UNet-based method, (c) Multiple Feature Fusion (MFF) method.
  • Figure 3: Network architecture of CTFN. The input is an image with arbitrary size, and the output is an edge map with the same size as the input. The first feature fusion stage is donated by $1^{st} fusion$ and $2^{nd} fusion$ stands for the second feature fusion stage.
  • Figure 4: Pseudo Pixel-level Weighting module. The input of the PPW module is the multiscale feature and the output is the final edge. "5" represents the number of multi-scale feature maps, $H$ and $W$ represent the height and width of the image, which are also the height and width of the input and output of the PPW module.
  • Figure 5: Precision–Recall curves of CTFN compared with other existing works on BSDS500 dataset.
  • ...and 6 more figures