Table of Contents
Fetching ...

Enhancing Edge Detection by Texture Handling Architecture and Noiseless Training Data

Hao Shu

TL;DR

The paper tackles the challenge of achieving high-precision edge detection under strict evaluation while addressing the impact of noisy human annotations. It introduces SDPED, a CSDB-based ED model that avoids down-sampling to preserve detail and employs an extended fusion block to improve feature integration, achieving state-of-the-art results with fewer parameters. A novel noiseless data augmentation strategy uses ground-truth edge maps as inputs to enable training with noiseless data, improving performance on edge maps and robustness across datasets. Across BRIND, UDED, MDBD, and BIPED, SDPED delivers substantial gains in AP and consistently outperforms prior methods, offering a practical path toward more reliable and data-efficient ED systems.

Abstract

Image edge detection (ED) is a fundamental task in computer vision. While convolution-based models have significantly advanced ED performance, achieving high precision under strict error tolerance constraints remains challenging. Furthermore, the reliance on noisy, human-annotated training data limits model performance, even when the inputs are edge maps themselves. In this paper, we address these challenges in two key aspects. First, we propose a novel ED model incorporating Cascaded Skipping Density Blocks (CSDB) to enhance precision and robustness. Our model achieves state-of-the-art (SOTA) performance across multiple datasets, with substantial improvements in average precision (AP), as demonstrated by extensive experiments. Second, we introduce a novel data augmentation strategy that enables the integration of noiseless annotations during training, improving model performance, particularly when processing edge maps directly. Our findings contribute to a more precise ED architecture and the first method for integrating noiseless training data into ED tasks, offering potential directions for improving ED models. Codes can be found on https://github.com/Hao-B-Shu/SDPED.

Enhancing Edge Detection by Texture Handling Architecture and Noiseless Training Data

TL;DR

The paper tackles the challenge of achieving high-precision edge detection under strict evaluation while addressing the impact of noisy human annotations. It introduces SDPED, a CSDB-based ED model that avoids down-sampling to preserve detail and employs an extended fusion block to improve feature integration, achieving state-of-the-art results with fewer parameters. A novel noiseless data augmentation strategy uses ground-truth edge maps as inputs to enable training with noiseless data, improving performance on edge maps and robustness across datasets. Across BRIND, UDED, MDBD, and BIPED, SDPED delivers substantial gains in AP and consistently outperforms prior methods, offering a practical path toward more reliable and data-efficient ED systems.

Abstract

Image edge detection (ED) is a fundamental task in computer vision. While convolution-based models have significantly advanced ED performance, achieving high precision under strict error tolerance constraints remains challenging. Furthermore, the reliance on noisy, human-annotated training data limits model performance, even when the inputs are edge maps themselves. In this paper, we address these challenges in two key aspects. First, we propose a novel ED model incorporating Cascaded Skipping Density Blocks (CSDB) to enhance precision and robustness. Our model achieves state-of-the-art (SOTA) performance across multiple datasets, with substantial improvements in average precision (AP), as demonstrated by extensive experiments. Second, we introduce a novel data augmentation strategy that enables the integration of noiseless annotations during training, improving model performance, particularly when processing edge maps directly. Our findings contribute to a more precise ED architecture and the first method for integrating noiseless training data into ED tasks, offering potential directions for improving ED models. Codes can be found on https://github.com/Hao-B-Shu/SDPED.
Paper Structure (34 sections, 5 equations, 5 figures, 4 tables)

This paper contains 34 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Predictions of three previous ED models on the BRIND dataset. $Off_E$ denotes models retrained using the official data augmentation and tested on edge maps, while $Our_E$ represents models retrained using our proposed data augmentation (detailed in Section III) and tested on edge maps. Similarly, $Off_O$ and $Our_O$ correspond to models trained with respective augmentation strategies but tested on standard images. Our augmentation method produces sharper predictions on edge maps and potentially enhances performance on standard images.
  • Figure 2: Structure of the SDPED Model: The blue square represents the $3\times 3$ convolutional layer, the purple square represents the $1\times 1$ convolutional layer, the pink square represents the Leaky Relu, the black square represents the Sigmoid, $\oplus$ represents pixel-wise addition, and represents the concatenation. The top path represents the main processing pipeline, where the first two convolution layers output 32 and 64 feature maps, respectively. The last two convolution layers maintain 64 input and output features. The model comprises $n+2$ layers, including the feature extractor block, each CSDB unit, and the final processing block, all of which produce intermediate outputs fused into 21 features. These intermediate outputs are concatenated, resulting in a total of $21 \times (n+2)$ features for the final fusing layer. The bottom path represents the fusing block, where the concatenated features pass through a $3\times 3$ convolution layer followed by two $1 \times 1$ convolution layers. The output feature dimensions of these layers are 256, 512, and 1, respectively.
  • Figure 3: Cascading Skipping Dense Block (CSDB): Each CSDB block consists of three cascaded SDB blocks connected via skip connections. Within a single SDB block, there are five convolution layers. The first four layers output 32 feature maps, while the final layer produces 64 feature maps. The input feature dimensions grow progressively from 64 to 96, 128, 160, and 192 across the first four layers.
  • Figure 4: Edge annotation inconsistencies: (a) A typical image region; (b) and (c) illustrate different annotation styles, where black squares denote edge pixels. In (b) and (c), non-corner edge pixels may deviate by one pixel horizontally or vertically, while corner edge pixels may shift diagonally by one pixel.
  • Figure 5: Sample predictions on BRIND and UDED, all from partition $P_{1}$ and displayed without NMS. The results illustrate that our model generates more visually coherent and reliable predictions.