Table of Contents
Fetching ...

Holistically-Nested Edge Detection

Saining Xie, Zhuowen Tu

TL;DR

The paper tackles edge and boundary detection in natural images by introducing holistically-nested edge detection (HED), a fully convolutional, image-to-image network trained with deep supervision on multiple side outputs. By trimming a VGG-16 backbone and adding side-output branches at multiple depths, HED learns rich multi-scale representations and fuses them with a learnable weighted layer, yielding accurate edges with practical speed. Extensive experiments on BSDS500 and NYUDv2 demonstrate state-of-the-art performance and robust performance gains from multi-scale supervision, consensus labeling, and RGB-D depth encoding via HHA features. The approach achieves ~0.4 s per image on GPUs and benefits from additional training data, highlighting its practical impact for real-world edge detection tasks.

Abstract

We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning. Our proposed method, holistically-nested edge detection (HED), performs image-to-image prediction by means of a deep learning model that leverages fully convolutional neural networks and deeply-supervised nets. HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to approach the human ability resolve the challenging ambiguity in edge and object boundary detection. We significantly advance the state-of-the-art on the BSD500 dataset (ODS F-score of .782) and the NYU Depth dataset (ODS F-score of .746), and do so with an improved speed (0.4 second per image) that is orders of magnitude faster than some recent CNN-based edge detection algorithms.

Holistically-Nested Edge Detection

TL;DR

The paper tackles edge and boundary detection in natural images by introducing holistically-nested edge detection (HED), a fully convolutional, image-to-image network trained with deep supervision on multiple side outputs. By trimming a VGG-16 backbone and adding side-output branches at multiple depths, HED learns rich multi-scale representations and fuses them with a learnable weighted layer, yielding accurate edges with practical speed. Extensive experiments on BSDS500 and NYUDv2 demonstrate state-of-the-art performance and robust performance gains from multi-scale supervision, consensus labeling, and RGB-D depth encoding via HHA features. The approach achieves ~0.4 s per image on GPUs and benefits from additional training data, highlighting its practical impact for real-world edge detection tasks.

Abstract

We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning. Our proposed method, holistically-nested edge detection (HED), performs image-to-image prediction by means of a deep learning model that leverages fully convolutional neural networks and deeply-supervised nets. HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to approach the human ability resolve the challenging ambiguity in edge and object boundary detection. We significantly advance the state-of-the-art on the BSD500 dataset (ODS F-score of .782) and the NYU Depth dataset (ODS F-score of .746), and do so with an improved speed (0.4 second per image) that is orders of magnitude faster than some recent CNN-based edge detection algorithms.

Paper Structure

This paper contains 12 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Illustration of the proposed HED algorithm. In the first row: (a) shows an example test image in the BSD500 dataset martin2004learning; (b) shows its corresponding edges as annotated by human subjects; (c) displays the HED results. In the second row: (d), (e), and (f), respectively, show side edge responses from layers $2$, $3$, and $4$ of our convolutional neural networks. In the third row: (g), (h), and (i), respectively, show edge responses from the Canny detector canny1986computational at the scales $\sigma=2.0$, $\sigma=4.0$, and $\sigma=8.0$. HED shows a clear advantage in consistency over Canny.
  • Figure 2: Illustration of different multi-scale deep learning architecture configurations: (a) multi-stream architecture; (b) skip-layer net architecture; (c) a single model running on multi-scale inputs; (d) separate training of different networks; (e) our proposed holistically-nested architectures, where multiple side outputs are added.
  • Figure 3: Illustration of our network architecture for edge detection, highlighting the error backpropagation paths. Side-output layers are inserted after convolutional layers. Deep supervision is imposed at each side-output layer, guiding the side-outputs towards edge predictions with the characteristics we desire. The outputs of HED are multi-scale and multi-level, with the side-output-plane size becoming smaller and the receptive field size becoming larger. One weighted-fusion layer is added to automatically learn how to combine outputs from multiple scales. The entire network is trained with multiple error propagation paths (dashed lines).
  • Figure 4: Two examples illustrating how deep supervision helps side-output layers to produce multi-scale dense predictions. Note that in the left column, the side outputs become progressively coarser and more "global", while critical object boundaries are preserved. In the right column, the predictions tends to lack any discernible order (e.g. in layers 1 and 2), and many boundaries are lost in later stages.
  • Figure 5: Results on the BSDS500 dataset. Our proposed HED framework achieves the best result (ODS=.782). Compared to several recent CNN-based edge detectors, our approach is also orders of magnitude faster. See Table \ref{['tb:bsds']} for a detailed discussion.
  • ...and 2 more figures