Table of Contents
Fetching ...

Cycle Pixel Difference Network for Crisp Edge Detection

Changsong Liu, Wei Zhang, Yanyan Liu, Mingyang Li, Wenlin Li, Yimeng Fan, Xiangnan Bai, Liang Zhang

TL;DR

Cycle Pixel Difference Network (CPD-Net) targets two persistent edge-detection challenges: reliance on large pre-trained weights and producing thick edges. It introduces Cycle Pixel Difference Convolution (CPDC) to encode edge priors from four directions, enabling scratch training, and couples this with a Multi-scale Information Enhancement Module (MSEM) and a Dual Residual Connection (DRC) decoder to sharpen edge localization. A Hybrid Focal Loss combines focal Tversky and focal loss to address pixel imbalance, further improving contour fidelity. Evaluations on BSDS500, NYUD-V2, BIPED, and CID show competitive performance without pretraining and good edge crispness, with CPD-Net achieving strong results while maintaining a compact, efficient model. The work proposes a practical, resource-efficient edge detector that closely adheres to ground-truth contours and offers potential for real-time applications in constrained environments.

Abstract

Edge detection, as a fundamental task in computer vision, has garnered increasing attention. The advent of deep learning has significantly advanced this field. However, recent deep learning-based methods generally face two significant issues: 1) reliance on large-scale pre-trained weights, and 2) generation of thick edges. We construct a U-shape encoder-decoder model named CPD-Net that successfully addresses these two issues simultaneously. In response to issue 1), we propose a novel cycle pixel difference convolution (CPDC), which effectively integrates edge prior knowledge with modern convolution operations, consequently successfully eliminating the dependence on large-scale pre-trained weights. As for issue 2), we construct a multi-scale information enhancement module (MSEM) and a dual residual connection-based (DRC) decoder to enhance the edge location ability of the model, thereby generating crisp and clean contour maps. Comprehensive experiments conducted on four standard benchmarks demonstrate that our method achieves competitive performance on the BSDS500 dataset (ODS=0.813 and AC=0.352), NYUD-V2 (ODS=0.760 and AC=0.223), BIPED dataset (ODS=0.898 and AC=0.426), and CID (ODS=0.59). Our approach provides a novel perspective for addressing these challenges in edge detection.

Cycle Pixel Difference Network for Crisp Edge Detection

TL;DR

Cycle Pixel Difference Network (CPD-Net) targets two persistent edge-detection challenges: reliance on large pre-trained weights and producing thick edges. It introduces Cycle Pixel Difference Convolution (CPDC) to encode edge priors from four directions, enabling scratch training, and couples this with a Multi-scale Information Enhancement Module (MSEM) and a Dual Residual Connection (DRC) decoder to sharpen edge localization. A Hybrid Focal Loss combines focal Tversky and focal loss to address pixel imbalance, further improving contour fidelity. Evaluations on BSDS500, NYUD-V2, BIPED, and CID show competitive performance without pretraining and good edge crispness, with CPD-Net achieving strong results while maintaining a compact, efficient model. The work proposes a practical, resource-efficient edge detector that closely adheres to ground-truth contours and offers potential for real-time applications in constrained environments.

Abstract

Edge detection, as a fundamental task in computer vision, has garnered increasing attention. The advent of deep learning has significantly advanced this field. However, recent deep learning-based methods generally face two significant issues: 1) reliance on large-scale pre-trained weights, and 2) generation of thick edges. We construct a U-shape encoder-decoder model named CPD-Net that successfully addresses these two issues simultaneously. In response to issue 1), we propose a novel cycle pixel difference convolution (CPDC), which effectively integrates edge prior knowledge with modern convolution operations, consequently successfully eliminating the dependence on large-scale pre-trained weights. As for issue 2), we construct a multi-scale information enhancement module (MSEM) and a dual residual connection-based (DRC) decoder to enhance the edge location ability of the model, thereby generating crisp and clean contour maps. Comprehensive experiments conducted on four standard benchmarks demonstrate that our method achieves competitive performance on the BSDS500 dataset (ODS=0.813 and AC=0.352), NYUD-V2 (ODS=0.760 and AC=0.223), BIPED dataset (ODS=0.898 and AC=0.426), and CID (ODS=0.59). Our approach provides a novel perspective for addressing these challenges in edge detection.
Paper Structure (16 sections, 5 equations, 9 figures, 7 tables)

This paper contains 16 sections, 5 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: The overall architecture of our proposed cycle pixel difference network (CPD-Net). The whole network adopts a U-shape structure and can be split into four stages. Each stage consists of four CPDC blocks, an MSEM, and a DRC decoder. The MSEM serves as the skip connection to enhance the multi-scale information, and the DRC decoder can decode the features with complete edge information. The final predicted edge maps are obtained by channel-wise concatenation of the outputs from the lateral connection and the DRC decoder.
  • Figure 2: Different types of pixel difference computation. From (a) to (e): (a) indicates a $3\times3$ input feature map, where $x_i$ represents the $i\text{-}th$ pixel value of the feature map. (b) to (e) illustrate the proposed cycle pixel difference operators, which calculate pixel differences in four directions: horizontal, vertical, diagonal, and cross. From (f) to (j): (f) and (h) represent a $5\times5$ and a $3\times3$ feature map, respectively. (g), (i), and (j) represent the pixel difference operators which are proposed in PiDiNet su2021pixel.
  • Figure 3: The operation of pixel difference in an image multiplied by a standard convolution weight can be mathematically converted into an equivalent operation where the image is directly multiplied by a weight difference template.
  • Figure 4: The building block based on cycle pixel difference convolution.
  • Figure 5: The architecture of MSEM. It consists of four parallel branches, each with $1\times1$ and $3\times3-r$ convolutions where $r$ denotes the dilation rate, followed by channel-wise concatenation, a $1\times1$ convolution, and an SE block. A skip connection links the input to the processed features, allowing the module to adaptively enhance multi-scale information for edge detection.
  • ...and 4 more figures