Breaking The Ice: Video Segmentation for Close-Range Ice-Covered Waters
Corwin Grant Jeon MacMillan, K. Andrea Scott, Matthew Garvin, Zhao Pan
TL;DR
This work addresses the challenge of automated ice-condition assessment from close-range ship imagery in the Arctic, where occlusions and lens artifacts hinder single-image segmentation. It introduces UPerFlow, a video semantic segmentation model that fuses a six-channel ResNet encoder with dual decoders and a PWCNet-based optical-flow path, connected via cross-connections and equipped with bi-directional flow to leverage temporal context. A novel Amundsen dataset of roughly 945–946 labeled images is presented, featuring region-based semi-manual annotations across six classes (iceberg, ice floe, water, brash ice, ship, sky) and includes careful handling of lens artifacts. Empirically, UPerFlow yields strong overall performance (mIoU up to 0.844, mAcc 0.948) and markedly improved occlusion handling (mIoU up to 0.736 in heavy occlusion), outperforming image-based baselines by about 38% in occluded regions and demonstrating robustness to real-world occlusions. These results have direct implications for safer, data-driven navigation in increasingly accessible Arctic waters.
Abstract
Rapid ice recession in the Arctic Ocean, with predictions of ice-free summers by 2060, opens new maritime routes but requires reliable navigation solutions. Current approaches rely heavily on subjective expert judgment, underscoring the need for automated, data-driven solutions. This study leverages machine learning to assess ice conditions using ship-borne optical data, introducing a finely annotated dataset of 946 images, and a semi-manual, region-based annotation technique. The proposed video segmentation model, UPerFlow, advances the SegFlow architecture by incorporating a six-channel ResNet encoder, two UPerNet-based segmentation decoders for each image, PWCNet as the optical flow encoder, and cross-connections that integrate bi-directional flow features without loss of latent information. The proposed architecture outperforms baseline image segmentation networks by an average 38% in occluded regions, demonstrating the robustness of video segmentation in addressing challenging Arctic conditions.
