Table of Contents
Fetching ...

The Effectiveness of Edge Detection Evaluation Metrics for Automated Coastline Detection

Conor O'Sullivan, Seamus Coveney, Xavier Monteys, Soumyabrata Dev

TL;DR

FOM was the most reliable metric for selecting the best threshold and was not useful for evaluating edge detection in general, according to an experiment to find reliable metrics.

Abstract

We analyse the effectiveness of RMSE, PSNR, SSIM and FOM for evaluating edge detection algorithms used for automated coastline detection. Typically, the accuracy of detected coastlines is assessed visually. This can be impractical on a large scale leading to the need for objective evaluation metrics. Hence, we conduct an experiment to find reliable metrics. We apply Canny edge detection to 95 coastline satellite images across 49 testing locations. We vary the Hysteresis thresholds and compare metric values to a visual analysis of detected edges. We found that FOM was the most reliable metric for selecting the best threshold. It could select a better threshold 92.6% of the time and the best threshold 66.3% of the time. This is compared RMSE, PSNR and SSIM which could select the best threshold 6.3%, 6.3% and 11.6% of the time respectively. We provide a reason for these results by reformulating RMSE, PSNR and SSIM in terms of confusion matrix measures. This suggests these metrics not only fail for this experiment but are not useful for evaluating edge detection in general.

The Effectiveness of Edge Detection Evaluation Metrics for Automated Coastline Detection

TL;DR

FOM was the most reliable metric for selecting the best threshold and was not useful for evaluating edge detection in general, according to an experiment to find reliable metrics.

Abstract

We analyse the effectiveness of RMSE, PSNR, SSIM and FOM for evaluating edge detection algorithms used for automated coastline detection. Typically, the accuracy of detected coastlines is assessed visually. This can be impractical on a large scale leading to the need for objective evaluation metrics. Hence, we conduct an experiment to find reliable metrics. We apply Canny edge detection to 95 coastline satellite images across 49 testing locations. We vary the Hysteresis thresholds and compare metric values to a visual analysis of detected edges. We found that FOM was the most reliable metric for selecting the best threshold. It could select a better threshold 92.6% of the time and the best threshold 66.3% of the time. This is compared RMSE, PSNR and SSIM which could select the best threshold 6.3%, 6.3% and 11.6% of the time respectively. We provide a reason for these results by reformulating RMSE, PSNR and SSIM in terms of confusion matrix measures. This suggests these metrics not only fail for this experiment but are not useful for evaluating edge detection in general.
Paper Structure (4 sections, 24 equations, 6 figures, 2 tables)

This paper contains 4 sections, 24 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Input images and ground truth masks obtained from SWED seale2022swed. The ground truth edge map has been obtained by applying canny edge detection to the ground truth mask.
  • Figure 2: Average RMSE, PSNR, SSIM and FOM for each spectral band with error bars given by the standard deviation. Based on RMSE, PSNR and SSIM, the performance improves as we increase the thresholds and the NIR band is the worst-performing band. For FOM, we do not see a monotonic improvement for any of the spectral bands.
  • Figure 3: Examples of where FOM correctly identifies the detected edge map that is visually the most similar to the ground truth. Each row gives a different example image. The columns give the edge maps detected using the increasing Hysterious thresholds. In each case, the NIR band is used as the input to Canny. The bold numbers give the best value for each metric.
  • Figure 4: Examples of where FOM fails to identify the best-detected edge map. The bold numbers give the best value for each metric. The NIR band is used as input for Canny.
  • Figure 5: The effect on confusion matrix measures as we increase the Hysteresis. Only the minimum and maximum thresholds are given. The NIR band is used as input for Canny.
  • ...and 1 more figures