Table of Contents
Fetching ...

Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)

Nicolas Drapier, Aladine Chetouani, Aurélien Chateigner

TL;DR

The paper tackles global maritime trajectory forecasting using only GNSS data by discretising geographic coordinates with Uber's H3 index (resolution 10) and encoding them into a compact pseudo-octal token vocabulary. A causal language model (Mixtral8x7B) autoregressively predicts sequences of H3-based tokens, enabling long-range trajectory forecasting with a 2560-token sliding window and 220M parameters. Evaluated with Fréchet distance against Kalman filter baselines and the TrAISformer, the approach achieves substantial predictive accuracy and demonstrates global applicability, including a reported capacity to forecast up to 8 hours ahead with 30 minutes of context. The work identifies hallucinations as a characteristic of autoregressive trajectory generation and discusses potential mitigations via RL-based alignment and richer spatial tokens, while outlining future enhancements such as Fourier transforms to further improve modeling of maritime trajectories.

Abstract

The prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fréchet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours ahead with 30 minutes of context, using solely GNSS positions, without relying on any additional information such as speed, course, or external conditions - unlike many traditional methods. We demonstrate that this alternative works well enough to predict trajectories worldwide.

Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)

TL;DR

The paper tackles global maritime trajectory forecasting using only GNSS data by discretising geographic coordinates with Uber's H3 index (resolution 10) and encoding them into a compact pseudo-octal token vocabulary. A causal language model (Mixtral8x7B) autoregressively predicts sequences of H3-based tokens, enabling long-range trajectory forecasting with a 2560-token sliding window and 220M parameters. Evaluated with Fréchet distance against Kalman filter baselines and the TrAISformer, the approach achieves substantial predictive accuracy and demonstrates global applicability, including a reported capacity to forecast up to 8 hours ahead with 30 minutes of context. The work identifies hallucinations as a characteristic of autoregressive trajectory generation and discusses potential mitigations via RL-based alignment and richer spatial tokens, while outlining future enhancements such as Fourier transforms to further improve modeling of maritime trajectories.

Abstract

The prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fréchet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours ahead with 30 minutes of context, using solely GNSS positions, without relying on any additional information such as speed, course, or external conditions - unlike many traditional methods. We demonstrate that this alternative works well enough to predict trajectories worldwide.
Paper Structure (25 sections, 4 equations, 18 figures, 4 tables, 2 algorithms)

This paper contains 25 sections, 4 equations, 18 figures, 4 tables, 2 algorithms.

Figures (18)

  • Figure 1: Flowchart of the method presented throughout the article.Green rectangles represent transformation processes, yellow rectangles correspond to examples.
  • Figure 2: Illustration of the hexagons of the H3 index at different resolutions over Manhattan centred on the latitude/longitude coordinates 40.73129, -73.99288.Resolution 10 is shown in blue, resolution 9 in red and resolution 8 in green.
  • Figure 3: Comparison of maritime trajectories before and after dataset cleaning.(a) Before cleaning, the trajectories exhibit numerous errors, with irregular jumps crossing over land and a high concentration of anomalies covering Europe. These errors indicate a poor capture of the actual ship movements. (b) After applying our cleaning procedure, the trajectories are corrected and closely follow the maritime contours of the continents, removing artefacts and errors that cross landmasses. This process ensures a more accurate representation of the actual maritime routes.
  • Figure 4: Comparison of the Fréchet distance versus Mean Absolute Error (MAE) and Mean Squared Error (MSE).Over a distance of 596 km, the predicted trajectory deviates significantly from the ground truth, making such a prediction unacceptable. The MAE gives the lowest error, but this value is misleading as it does not capture the true nature of the deviation, being too low due to its point-by-point comparison. In contrast, the Fréchet distance accounts for the overall shape of the trajectories, providing a higher and more realistic measure of error.
  • Figure 5: Violin plot of the prediction distance. Without outliers, highlighting the mean (green star, indicated by the green arrow head), median, and density peak for a context length of 60 minutes and a prediction length of 90 minutes. See figures \ref{['fig:prediction_distance_violin_6090']}, \ref{['fig:prediction_distance_violin_60240']} and \ref{['fig:prediction_distance_violin_60390']} in \ref{['anx:prediction']} for the full analysis.
  • ...and 13 more figures