Table of Contents
Fetching ...

A Gradient Boosted Mixed-Model Machine Learning Framework for Vessel Speed in the U.S. Arctic

Mauli Pant, Linda Fernandez, Indranil Sahoo

TL;DR

Arctic vessel operating regimes relevant to speed management and corridor-level assessment are empirically characterized using a two-stage machine learning framework.

Abstract

Understanding how environmental and operational conditions influence vessel speed is crucial for characterizing navigational conditions in the Arctic. We analyzed Automatic Identification System (AIS) data from 2010-2019 to examine vessel speed over ground (SOG). Over half of the AIS records showed zero SOG, and treating zero and positive SOG as a single continuous process can obscure important patterns. We therefore applied a two-stage machine learning framework, first modeling the probability of SOG greater than zero and then modeling SOG conditional on being positive. AIS observations were integrated with sea ice concentration, course over ground, wind, bathymetric depth, distance to coast, vessel group, and navigational status. Gradient boosted decision trees with random effects captured nonlinear environmental responses while accounting for repeated observations. The positive SOG classifier achieved strong discrimination (AUC = 0.85), while the conditional speed model explained approximately 77 percent of out-of-fold variance. SHAP values quantified covariate effects by decomposing model predictions into additive contributions from individual variables. Distance to coast and bathymetric depth were dominant determinants of both the likelihood and magnitude of vessel speed, while changes in course, vessel group, and navigational status introduced secondary variation. Wind and sea ice effects were modest. Together, these results empirically characterize Arctic vessel operating regimes relevant to speed management and corridor-level assessment.

A Gradient Boosted Mixed-Model Machine Learning Framework for Vessel Speed in the U.S. Arctic

TL;DR

Arctic vessel operating regimes relevant to speed management and corridor-level assessment are empirically characterized using a two-stage machine learning framework.

Abstract

Understanding how environmental and operational conditions influence vessel speed is crucial for characterizing navigational conditions in the Arctic. We analyzed Automatic Identification System (AIS) data from 2010-2019 to examine vessel speed over ground (SOG). Over half of the AIS records showed zero SOG, and treating zero and positive SOG as a single continuous process can obscure important patterns. We therefore applied a two-stage machine learning framework, first modeling the probability of SOG greater than zero and then modeling SOG conditional on being positive. AIS observations were integrated with sea ice concentration, course over ground, wind, bathymetric depth, distance to coast, vessel group, and navigational status. Gradient boosted decision trees with random effects captured nonlinear environmental responses while accounting for repeated observations. The positive SOG classifier achieved strong discrimination (AUC = 0.85), while the conditional speed model explained approximately 77 percent of out-of-fold variance. SHAP values quantified covariate effects by decomposing model predictions into additive contributions from individual variables. Distance to coast and bathymetric depth were dominant determinants of both the likelihood and magnitude of vessel speed, while changes in course, vessel group, and navigational status introduced secondary variation. Wind and sea ice effects were modest. Together, these results empirically characterize Arctic vessel operating regimes relevant to speed management and corridor-level assessment.
Paper Structure (15 sections, 9 equations, 14 figures, 4 tables)

This paper contains 15 sections, 9 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Distribution of environmental and spatial covariates by Zero or positive SOG (July-October, 2010-2019). Density curves compare SOG = 0 and SOG $>$ 0 AIS observations. Wind effect is represented using along-track and cross-track components (m$\cdot$s$^{-1}$), consistent with the regression specification. Axes are shown on variable specific scales to emphasize within variable distributional differences.
  • Figure 2: Spatial distribution of binary ice-related speed risk across the U.S. Arctic (July–October, 2010–2019). Grid cell shading indicates the proportion of AIS observations exceeding an ice-dependent safe speed threshold (Figure A1). Black triangles mark the ten highest vessel density locations, scaled by a relative density index. Bathymetric depth contours provide spatial context.
  • Figure 3: ROC curve with AUC (out-of-fold predictions). The close overlap of curves across folds indicates that classifier performance is robust to spatial, temporal, and vessel level heterogeneity and generalizes well beyond individual training sets.
  • Figure 4: Evaluation of the SOG zero or positive SOG classification model using out-of-fold predictions shows the precision recall curve, with color indicating the decision threshold and illustrating the trade off between precision and recall across operating points.
  • Figure 5: Out-of-fold calibration plot for the GPBoost model. Mean observed and predicted vessel speeds are compared across equal frequency bins of predicted SOG. The dashed line shows perfect calibration, and the smooth curve highlights systematic deviations.
  • ...and 9 more figures