Scalable Higher Resolution Polar Sea Ice Classification and Freeboard Calculation from ICESat-2 ATL03 Data
Jurdana Masuma Iqrah, Younghyun Koo, Wei Wang, Hongjie Xie, Sushil K. Prasad
TL;DR
This work addresses the need for higher-resolution sea ice surface height and freeboard information beyond the ATL07/ATL10 products by reprocessing ICESat-2 ATL03 data at 2 m resolution. It couples Sentinel-2–based auto-labeling with deep learning (LSTM and MLP) to classify ATL03 segments into thick ice, thin ice, and open water, and uses Horovod for distributed training to scale on multi-GPU clusters. The authors also implement PySpark-based parallelization for auto-labeling and freeboard computation, achieving up to 16.25x auto-labeling speedups and 8.5x data-loading plus 15.7x map-reduce speedups for freeboard, with the LSTM model reaching 96.56% accuracy versus 91.80% for the MLP. The resulting high-resolution local sea surface height and freeboard products offer improved representations of sea ice dynamics in polar regions, demonstrating scalable methods that could enable polar-wide products in a cloud-enabled pipeline.
Abstract
ICESat-2 (IS2) by NASA is an Earth-observing satellite that measures high-resolution surface elevation. The IS2's ATL07 and ATL10 sea ice elevation and freeboard products of 10m-200m segments which aggregated 150 signal photons from the raw ATL03 (geolocated photon) data. These aggregated products can potentially overestimate local sea surface height, thus underestimating the calculations of freeboard (sea ice height above sea surface). To achieve a higher resolution of sea surface height and freeboard information, in this work we utilize a 2m window to resample the ATL03 data. Then, we classify these 2m segments into thick sea ice, thin ice, and open water using deep learning methods (Long short-term memory and Multi-layer perceptron models). To obtain labeled training data for our deep learning models, we use segmented Sentinel-2 (S2) multi-spectral imagery overlapping with IS2 tracks in space and time to auto-label IS2 data, followed by some manual corrections in the regions of transition between different ice/water types or cloudy regions. We employ a parallel workflow for this auto-labeling using PySpark to scale, and we achieve 9-fold data loading and 16.25-fold map-reduce speedup. To train our models, we employ a Horovod-based distributed deep-learning workflow on a DGX A100 8 GPU cluster, achieving a 7.25-fold speedup. Next, we calculate the local sea surface heights based on the open water segments. Finally, we scale the freeboard calculation using the derived local sea level and achieve 8.54-fold data loading and 15.7-fold map-reduce speedup. Compared with the ATL07 (local sea level) and ATL10 (freeboard) data products, our results show higher resolutions and accuracy (96.56%).
