Table of Contents
Fetching ...

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

Yunfan Ye, Kai Xu, Yuhang Huang, Renjiao Yi, Zhiping Cai

TL;DR

DiffusionEdge introduces the first diffusion probabilistic model for general edge detection, achieving crisp, accurate edge maps without post-processing by operating in latent space with a decoupled diffusion architecture. It integrates an Adaptive FFT-filter to adaptively modulate frequency components and employs uncertainty distillation to supervise latent-space learning while preserving pixel-level uncertainty. The method demonstrates superior performance across BSDS, NYUDv2, Multicue, and BIPED on metrics of correctness and edge crispness, while maintaining reasonable training and inference efficiency. By directly recovering single-width edge contours and reducing reliance on augmentation, DiffusionEdge offers a practical, end-to-end edge detector with strong potential for downstream perception tasks.

Abstract

Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we propose the first diffusion model for the task of general edge detection, which we call DiffusionEdge. To avoid expensive computational resources while retaining the final performance, we apply DPM in the latent space and enable the classic cross-entropy loss which is uncertainty-aware in pixel level to directly optimize the parameters in latent space in a distillation manner. We also adopt a decoupled architecture to speed up the denoising process and propose a corresponding adaptive Fourier filter to adjust the latent features of specific frequencies. With all the technical designs, DiffusionEdge can be stably trained with limited resources, predicting crisp and accurate edge maps with much fewer augmentation strategies. Extensive experiments on four edge detection benchmarks demonstrate the superiority of DiffusionEdge both in correctness and crispness. On the NYUDv2 dataset, compared to the second best, we increase the ODS, OIS (without post-processing) and AC by 30.2%, 28.1% and 65.1%, respectively. Code: https://github.com/GuHuangAI/DiffusionEdge.

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

TL;DR

DiffusionEdge introduces the first diffusion probabilistic model for general edge detection, achieving crisp, accurate edge maps without post-processing by operating in latent space with a decoupled diffusion architecture. It integrates an Adaptive FFT-filter to adaptively modulate frequency components and employs uncertainty distillation to supervise latent-space learning while preserving pixel-level uncertainty. The method demonstrates superior performance across BSDS, NYUDv2, Multicue, and BIPED on metrics of correctness and edge crispness, while maintaining reasonable training and inference efficiency. By directly recovering single-width edge contours and reducing reliance on augmentation, DiffusionEdge offers a practical, end-to-end edge detector with strong potential for downstream perception tasks.

Abstract

Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we propose the first diffusion model for the task of general edge detection, which we call DiffusionEdge. To avoid expensive computational resources while retaining the final performance, we apply DPM in the latent space and enable the classic cross-entropy loss which is uncertainty-aware in pixel level to directly optimize the parameters in latent space in a distillation manner. We also adopt a decoupled architecture to speed up the denoising process and propose a corresponding adaptive Fourier filter to adjust the latent features of specific frequencies. With all the technical designs, DiffusionEdge can be stably trained with limited resources, predicting crisp and accurate edge maps with much fewer augmentation strategies. Extensive experiments on four edge detection benchmarks demonstrate the superiority of DiffusionEdge both in correctness and crispness. On the NYUDv2 dataset, compared to the second best, we increase the ODS, OIS (without post-processing) and AC by 30.2%, 28.1% and 65.1%, respectively. Code: https://github.com/GuHuangAI/DiffusionEdge.
Paper Structure (19 sections, 10 equations, 6 figures, 7 tables)

This paper contains 19 sections, 10 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: CNN-based methods, even the most recent and state-of-the-art one (UAED zhou2023treasure), generally have an encoder-decoder architecture with limitations of thick edges and more noise. We propose the diffusion-based edge detector which is superior in both correctness and crispness without any post-processing.
  • Figure 2: The overall framework of the proposed DiffusionEdge.
  • Figure 3: Examples of two baselines with accuracy and memory cost.
  • Figure 4: Qualitative comparisons on BSDS dataset with previous state-of-the-arts. Edge maps generated by our DiffusionEdge are both accurate and crisp with less noise. Zoom-in is highly recommended to observe the details.
  • Figure 5: Qualitative comparisons on NYUDv2 dataset with two state-of-the-art CNN-based and transformer-based methods. Edge maps generated by DiffusionEdge are much crisper and cleaner with competitive performance.
  • ...and 1 more figures