Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning

Nora Gourmelon; Konrad Heidler; Erik Loebel; Daniel Cheng; Julian Klink; Anda Dong; Fei Wu; Noah Maul; Moritz Koch; Marcel Dreier; Dakota Pyles; Thorsten Seehaus; Matthias Braun; Andreas Maier; Vincent Christlein

Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning

Nora Gourmelon, Konrad Heidler, Erik Loebel, Daniel Cheng, Julian Klink, Anda Dong, Fei Wu, Noah Maul, Moritz Koch, Marcel Dreier, Dakota Pyles, Thorsten Seehaus, Matthias Braun, Andreas Maier, Vincent Christlein

TL;DR

This work tackles the problem of automatically delineating glacier calving fronts in SAR imagery, a key metric for monitoring glacier mass loss. It conducts a comprehensive, head-to-head comparison of 22 DL systems on a common caffe benchmark, including a multi-annotator human baseline, to assess gaps between automatic and human performance. The study finds that the best DL system, a Vision Transformer-based HookFormer, achieves mean front-location deviations around the order of a few hundred meters ($MDE$ ≈ 183–221 m) and remains significantly worse than human annotations (≈ 38 m), indicating that fully automated monitoring remains challenging. It identifies promising directions, notably Vision Transformers and foundation models, and emphasizes the need for automated plausibility checks and improved integration of global information for practical, large-scale monitoring solutions.

Abstract

Calving front position variation of marine-terminating glaciers is an indicator of ice mass loss and a crucial parameter in numerical glacier models. Deep Learning (DL) systems can automatically extract this position from Synthetic Aperture Radar (SAR) imagery, enabling continuous, weather- and illumination-independent, large-scale monitoring. This study presents the first comparison of DL systems on a common calving front benchmark dataset. A multi-annotator study with ten annotators is performed to contrast the best-performing DL system against human performance. The best DL model's outputs deviate 221 m on average, while the average deviation of the human annotators is 38 m. This significant difference shows that current DL systems do not yet match human performance and that further research is needed to enable fully automated monitoring of glacier calving fronts. The study of Vision Transformers, foundation models, and the inclusion and processing strategy of more information are identified as avenues for future research.

Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning

TL;DR

≈ 183–221 m) and remains significantly worse than human annotations (≈ 38 m), indicating that fully automated monitoring remains challenging. It identifies promising directions, notably Vision Transformers and foundation models, and emphasizes the need for automated plausibility checks and improved integration of global information for practical, large-scale monitoring solutions.

Abstract

Paper Structure (15 sections, 1 equation, 7 figures, 10 tables)

This paper contains 15 sections, 1 equation, 7 figures, 10 tables.

Introduction
Benchmark dataset
Methodology
Deep Learning system comparison
Evaluation metrics
Statistical Analysis
Multi-annotator study
DL versus humans
Results
Influences on calving front delineation performance of Deep Learning systems
Variations between manual annotations
Significant difference between humans and DL
Conclusion
References Section
Biography Section

Figures (7)

Figure 1: Overview of mde with confidence intervals alongside the number of images with no predicted front for all 22 dl systems and the comparisons of the multi-annotator study. The number of images with no predicted front is log-scale intensity encoded from zero to the number of images in the test set. The multi-annotator study is on the right side of the violet dashed line. All comparisons in the multi-annotator study were performed with additionally post-processed calving fronts. The asterisk (*) indicates that the outputs were not compared with caffe's test set but with combined annotations from the multi-annotator study.
Figure 2: Predicted calving fronts of the five best-performing dl systems for an image of the Mapple Glacier taken on 24th October 2008 by the TerraSAR-X satellite. Yellow depicts the prediction, blue is used for the ground truth front, and pink signifies a perfect match between prediction and ground truth. The bounding box is given in turquoise.
Figure 3: Predicted calving fronts of the five best-performing dl systems for an image of the Columbia Glacier taken on 8th September 2017 by the Sentinel-1 satellite. Yellow depicts the prediction, blue is used for the ground truth front, and pink signifies a perfect match between prediction and ground truth. The bounding box is given in turquoise.
Figure 4: Visualizations for all ten annotations by humans (shades of blue), the five post-processed HookFormer runs (shades of red), and the aggregation of human annotations (yellow). (a) shows the Mapple Glacier on 2nd November 2009, acquired by the TSX satellite; (b) shows the Columbia Glacier on 15th March 2016, acquired by the TDX satellite; (c) shows the Mapple Glacier on 9th June 2007, acquired by the Envisat satellite; and (d) shows the Columbia Glacier on 6th January 2018, acquired by the S1 satellite.
Figure 5: Predicted calving fronts of all 22 dl systems for an image of the Columbia Glacier taken on 2nd January 2012 by the TanDEM-X satellite. Yellow depicts the prediction, blue is used for the ground truth front, and pink signifies a perfect match between prediction and ground truth. The bounding box is given in turquoise.
...and 2 more figures

Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning

TL;DR

Abstract

Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)