Table of Contents
Fetching ...

Breaking The Ice: Video Segmentation for Close-Range Ice-Covered Waters

Corwin Grant Jeon MacMillan, K. Andrea Scott, Matthew Garvin, Zhao Pan

TL;DR

This work addresses the challenge of automated ice-condition assessment from close-range ship imagery in the Arctic, where occlusions and lens artifacts hinder single-image segmentation. It introduces UPerFlow, a video semantic segmentation model that fuses a six-channel ResNet encoder with dual decoders and a PWCNet-based optical-flow path, connected via cross-connections and equipped with bi-directional flow to leverage temporal context. A novel Amundsen dataset of roughly 945–946 labeled images is presented, featuring region-based semi-manual annotations across six classes (iceberg, ice floe, water, brash ice, ship, sky) and includes careful handling of lens artifacts. Empirically, UPerFlow yields strong overall performance (mIoU up to 0.844, mAcc 0.948) and markedly improved occlusion handling (mIoU up to 0.736 in heavy occlusion), outperforming image-based baselines by about 38% in occluded regions and demonstrating robustness to real-world occlusions. These results have direct implications for safer, data-driven navigation in increasingly accessible Arctic waters.

Abstract

Rapid ice recession in the Arctic Ocean, with predictions of ice-free summers by 2060, opens new maritime routes but requires reliable navigation solutions. Current approaches rely heavily on subjective expert judgment, underscoring the need for automated, data-driven solutions. This study leverages machine learning to assess ice conditions using ship-borne optical data, introducing a finely annotated dataset of 946 images, and a semi-manual, region-based annotation technique. The proposed video segmentation model, UPerFlow, advances the SegFlow architecture by incorporating a six-channel ResNet encoder, two UPerNet-based segmentation decoders for each image, PWCNet as the optical flow encoder, and cross-connections that integrate bi-directional flow features without loss of latent information. The proposed architecture outperforms baseline image segmentation networks by an average 38% in occluded regions, demonstrating the robustness of video segmentation in addressing challenging Arctic conditions.

Breaking The Ice: Video Segmentation for Close-Range Ice-Covered Waters

TL;DR

This work addresses the challenge of automated ice-condition assessment from close-range ship imagery in the Arctic, where occlusions and lens artifacts hinder single-image segmentation. It introduces UPerFlow, a video semantic segmentation model that fuses a six-channel ResNet encoder with dual decoders and a PWCNet-based optical-flow path, connected via cross-connections and equipped with bi-directional flow to leverage temporal context. A novel Amundsen dataset of roughly 945–946 labeled images is presented, featuring region-based semi-manual annotations across six classes (iceberg, ice floe, water, brash ice, ship, sky) and includes careful handling of lens artifacts. Empirically, UPerFlow yields strong overall performance (mIoU up to 0.844, mAcc 0.948) and markedly improved occlusion handling (mIoU up to 0.736 in heavy occlusion), outperforming image-based baselines by about 38% in occluded regions and demonstrating robustness to real-world occlusions. These results have direct implications for safer, data-driven navigation in increasingly accessible Arctic waters.

Abstract

Rapid ice recession in the Arctic Ocean, with predictions of ice-free summers by 2060, opens new maritime routes but requires reliable navigation solutions. Current approaches rely heavily on subjective expert judgment, underscoring the need for automated, data-driven solutions. This study leverages machine learning to assess ice conditions using ship-borne optical data, introducing a finely annotated dataset of 946 images, and a semi-manual, region-based annotation technique. The proposed video segmentation model, UPerFlow, advances the SegFlow architecture by incorporating a six-channel ResNet encoder, two UPerNet-based segmentation decoders for each image, PWCNet as the optical flow encoder, and cross-connections that integrate bi-directional flow features without loss of latent information. The proposed architecture outperforms baseline image segmentation networks by an average 38% in occluded regions, demonstrating the robustness of video segmentation in addressing challenging Arctic conditions.

Paper Structure

This paper contains 20 sections, 4 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Examples of images from the Amundsen dataset. \ref{['fig:unlabelrain']} shows a common occlusion in the data, where a water droplet on the lens partially obscures the image. \ref{['fig:valframe360']} shows an example from the validation dataset. \ref{['fig:nvidialabel']} shows \ref{['fig:valframe360']} as it was shown in NVIDIA's 2021 keynote NVIDIAkeynote.
  • Figure 2: Diagram demonstrating the relation between distance in the real-world and how it is projected onto the image plane. Equal distances in the region of interest are represented with dashed colored lines, which all converge to the focal point of the lens. The furthest point visible in the ROI is denoted as $z_\infty$, while the corresponding point in the image plane is denoted $y_\infty$. The camera is included for illustrative purposes.
  • Figure 3: The labeling process for the Amundsen dataset. \ref{['fig:frame730']} shows a manually labeled image, with the ship labeled in red, the sky in light green, and the iceberg in purple. \ref{['fig:frame730otsu']} shows the result of MATLAB's multithresh, with the brightest classes in each region (represented by dashed multicolored lines) shown in yellow. \ref{['fig:waterthreshold']} shows the threshold distribution to label water, with pixels with intensity less than 100 labeled as water in the near-field. \ref{['fig:frame730labeloverlay']} shows the final annotation, overlaid on the original image.
  • Figure 4: Examples of labeled images from the Amundsen dataset. The original images are on the left, and the labels are on the right and overlaid on the original images for clarity. \ref{['fig:1_frame_215']} & \ref{['fig:1_frame_031']} are examples from the training dataset. \ref{['fig:1_frame_360']} is an example from the validation dataset. \ref{['fig:2_frame_050']} is an example from the test dataset.
  • Figure 5: The illustrated framework for UPerFlow is depicted in \ref{['fig:uperflow_components']}, where UPerNet serves as the segmentation branch (blue) and PWCNet serves as the optical flow branch (green). Concatenation operations are represented by pink circles, and the Pyramid Parsing Module of PSPNet zhao2017pspnet is denoted by PPM. The optical flow branch is depicted predicting the backward flow, while the segmentation branch generates segmentations for the first image. Yellow-highlighted sections indicate duplicated components within the architecture, responsible for predicting forward flow and segmentations for the second image. The contributions of this work are listed in purple as follows: (i) a six-channel input ResNet encoder, (ii) cross-connections from PWCNet, (iii) duplicated segmentation decoder, one for each image, and (iv) duplicated optical flow branches to predict bi-directional flow. \ref{['fig:pwcnet_components']} provides a detailed view of the decoder blocks in PWCNet, showing the integration of optical flow, cost volume, and warped features at the low-level feature stage.
  • ...and 2 more figures