Table of Contents
Fetching ...

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

Aniruddha Bora, Julie Chalfant, Chryssostomos Chryssostomidis

Abstract

International shipping produces approximately 3% of global greenhouse gas emissions, yet voyage routing remains dominated by heuristic methods. We present PIER (Physics-Informed, Energy-efficient, Risk-aware routing), an offline reinforcement learning framework that learns fuel-efficient, safety-aware routing policies from physics-calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator. Validated on one full year (2023) of AIS data across seven Gulf of Mexico routes (840 episodes per method), PIER reduces mean CO2 emissions by 10% relative to great-circle routing. However, PIER's primary contribution is eliminating catastrophic fuel waste: great-circle routing incurs extreme fuel consumption (>1.5x median) in 4.8% of voyages; PIER reduces this to 0.5%, a 9-fold reduction. Per-voyage fuel variance is 3.5x lower (p<0.001), with bootstrap 95% CI for mean savings [2.9%, 15.7%]. Partial validation against observed AIS vessel behavior confirms consistency with the fastest real transits while exhibiting 23.1x lower variance. Crucially, PIER is forecast-independent: unlike A* path optimization whose wave protection degrades 4.5x under realistic forecast uncertainty, PIER maintains constant performance using only local observations. The framework combines physics-informed state construction, demonstration-augmented offline data, and a decoupled post-hoc safety shield, an architecture that transfers to wildfire evacuation, aircraft trajectory optimization, and autonomous navigation in unmapped terrain.

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

Abstract

International shipping produces approximately 3% of global greenhouse gas emissions, yet voyage routing remains dominated by heuristic methods. We present PIER (Physics-Informed, Energy-efficient, Risk-aware routing), an offline reinforcement learning framework that learns fuel-efficient, safety-aware routing policies from physics-calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator. Validated on one full year (2023) of AIS data across seven Gulf of Mexico routes (840 episodes per method), PIER reduces mean CO2 emissions by 10% relative to great-circle routing. However, PIER's primary contribution is eliminating catastrophic fuel waste: great-circle routing incurs extreme fuel consumption (>1.5x median) in 4.8% of voyages; PIER reduces this to 0.5%, a 9-fold reduction. Per-voyage fuel variance is 3.5x lower (p<0.001), with bootstrap 95% CI for mean savings [2.9%, 15.7%]. Partial validation against observed AIS vessel behavior confirms consistency with the fastest real transits while exhibiting 23.1x lower variance. Crucially, PIER is forecast-independent: unlike A* path optimization whose wave protection degrades 4.5x under realistic forecast uncertainty, PIER maintains constant performance using only local observations. The framework combines physics-informed state construction, demonstration-augmented offline data, and a decoupled post-hoc safety shield, an architecture that transfers to wildfire evacuation, aircraft trajectory optimization, and autonomous navigation in unmapped terrain.
Paper Structure (20 sections, 8 equations, 14 figures, 11 tables)

This paper contains 20 sections, 8 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Pier framework and Gulf of Mexico routing domain.a, The Pier pipeline: AIS vessel tracking data and ocean reanalysis products (Copernicus Marine Service, NOAA CoastWatch ERDDAP) are fused into physics-informed state features (speed-loss model, hull-fatigue risk layer, 15 normalized features). An IQL agent is trained offline on a mixed dataset of A*-optimal teacher demonstrations and stochastic behavioral roll-outs. At inference, a post-hoc safety shield enforces land-avoidance and wave-exposure constraints before each routing decision. b, Gulf of Mexico routing domain with nine key ports (amber dots). Seven port-to-port routes are evaluated: Brownsville$\to$Key West, Corpus Christi$\to$Galveston, Corpus Christi$\to$Tampa, Galveston$\to$Key West, Mobile$\to$Tampa, New Orleans$\to$Tampa, and Port Arthur$\to$Mobile. c--f, Seasonal mean normalized HF (hull fatigue) exposure from Copernicus wave reanalysis, showing elevated hazard in winter (DJF) and fall (SON) compared with summer (JJA), particularly in the eastern Gulf and Florida Straits. This seasonal variation is the primary driver of Pier's fuel savings: weather-aware routing adds the most value when conditions are adverse.
  • Figure 2: Route performance across all 12 months of 2023.a, Arrival rate (percentage of episodes reaching destination) for each route and month. Brownsville$\to$Key West and Galveston$\to$Key West maintain 100% arrival year-round; Port Arthur$\to$Mobile shows persistent failures due to grid resolution limitations. b, Mean HF wave exposure by route and month. Brownsville$\to$Key West and Galveston$\to$Key West experience the highest exposure, peaking in winter (February) and fall (October--November), confirming these as the routes where weather-aware routing adds the most value. Missing cells indicate no arrived episodes.
  • Figure 3: Ablation study: contribution of each Pier component (2023, 12 months).a, Arrival rate. Removing the safety shield causes the largest drop ($-$6 percentage points), followed by physics features ($-$6 pp) and HF-risk awareness ($-$5 pp). Teacher demonstrations contribute the least ($-$3 pp). b, Mean transit time ($\pm$ std). Removing components generally increases both mean and variance. c, Mean HF wave exposure ($\pm$ std). The full model achieves the lowest exposure; removing the shield or physics features increases wave encounter risk. Dashed lines indicate full-model performance for reference.
  • Figure 4: Pier's CO2 advantage is concentrated in the tail of the distribution.a, Per-voyage CO2 distributions for Pier (teal) and great-circle routing (grey). Medians are nearly identical ($\sim$215--218 t), but great-circle routing has a heavy right tail extending beyond 1,600 t (red shaded region: $>$1.5$\times$ median). b, CO2 at successive quantiles. Savings grow from 1% at the median to 6% at the 95th percentile and 70% at the maximum, demonstrating that Pier's value is concentrated in eliminating worst-case fuel events. c, Per-route CO2 standard deviation. Great-circle variance exceeds Pier by 3--35$\times$ depending on route length, with the largest ratios on long cross-Gulf corridors. d, Mean versus median savings by season. Median savings are consistently 1--2% across all seasons, while mean savings vary from 5% to 24%, confirming that elevated means reflect tail-event elimination rather than systematic improvement.
  • Figure 5: Archipelago Basin simulated environment.a, Domain geometry with three experimental routes: S1 (open water, jet crossing), S2 (constrained corridor, peninsula navigation), S3 (storm crossing, HF-risk optimization). b, Significant wave height $H_s$ at $t=0$ h showing the Gaussian storm in the northeast quadrant. c, Ocean current speed with the zonal jet at $27.85^{\circ}$N. d, Hull-fatigue risk index peaking near the storm centre.
  • ...and 9 more figures