Table of Contents
Fetching ...

ImPORTance: Machine Learning-Driven Analysis of Global Port Significance and Network Dynamics for Improved Operational Efficiency

Emanuele Carlini, Domenico Di Gangi, Vinicius Monteiro de Lira, Hanna Kavalionak, Amilcar Soares, Gabriel Spadon

TL;DR

This work tackles the question of what makes certain ports globally central by building a Ports Network from three years of AIS data and predicting port centrality from World Port Index features using a Random Forest classifier. Centrality is defined as $A(p) = \frac{\sum_{c \in \mathscr{C}} z(c,p)}{|\,\mathscr{C}|}$, aggregating six measures including in/out degree, PageRank variants, betweenness, and closeness, with SHAP and SAGE used for local and global interpretability. The study finds that cargo depth and longitude are among the strongest predictors of centrality, with harbor size also contributing, and demonstrates robust predictive performance (AUC up to ~0.88) to identify central ports. These findings support data-driven planning for port development and resource allocation and offer a framework to extend analysis to other vessel modalities and regions.

Abstract

Seaports play a crucial role in the global economy, and researchers have sought to understand their significance through various studies. In this paper, we aim to explore the common characteristics shared by important ports by analyzing the network of connections formed by vessel movement among them. To accomplish this task, we adopt a bottom-up network construction approach that combines three years' worth of AIS (Automatic Identification System) data from around the world, constructing a Ports Network that represents the connections between different ports. Through this representation, we utilize machine learning to assess the relative significance of various port features. Our model examined such features and revealed that geographical characteristics and the port's depth are indicators of a port's importance to the Ports Network. Accordingly, this study employs a data-driven approach and utilizes machine learning to provide a comprehensive understanding of the factors contributing to the extent of ports. Our work aims to inform decision-making processes related to port development, resource allocation, and infrastructure planning within the industry.

ImPORTance: Machine Learning-Driven Analysis of Global Port Significance and Network Dynamics for Improved Operational Efficiency

TL;DR

This work tackles the question of what makes certain ports globally central by building a Ports Network from three years of AIS data and predicting port centrality from World Port Index features using a Random Forest classifier. Centrality is defined as , aggregating six measures including in/out degree, PageRank variants, betweenness, and closeness, with SHAP and SAGE used for local and global interpretability. The study finds that cargo depth and longitude are among the strongest predictors of centrality, with harbor size also contributing, and demonstrates robust predictive performance (AUC up to ~0.88) to identify central ports. These findings support data-driven planning for port development and resource allocation and offer a framework to extend analysis to other vessel modalities and regions.

Abstract

Seaports play a crucial role in the global economy, and researchers have sought to understand their significance through various studies. In this paper, we aim to explore the common characteristics shared by important ports by analyzing the network of connections formed by vessel movement among them. To accomplish this task, we adopt a bottom-up network construction approach that combines three years' worth of AIS (Automatic Identification System) data from around the world, constructing a Ports Network that represents the connections between different ports. Through this representation, we utilize machine learning to assess the relative significance of various port features. Our model examined such features and revealed that geographical characteristics and the port's depth are indicators of a port's importance to the Ports Network. Accordingly, this study employs a data-driven approach and utilizes machine learning to provide a comprehensive understanding of the factors contributing to the extent of ports. Our work aims to inform decision-making processes related to port development, resource allocation, and infrastructure planning within the industry.
Paper Structure (20 sections, 5 equations, 6 figures, 5 tables)

This paper contains 20 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Methodological Framework for Port Centrality and Feature Analysis. This diagram outlines the methodology used to analyze port centrality and its features following the bottom-up approach to creating our Ports Network. Starting from the AIS messages and port databases, the Ports Network is constructed to calculate centrality measures. Because various features form the basis of the data used for the classification task, the feature analysis identifies key factors contributing to port centrality.
  • Figure 2: Spatial Distribution and Centrality of Global Maritime Ports. This map displays the locations of major ports worldwide. The size of each circle corresponds to the importance and connectivity of the port within the global maritime network. Larger circles represent ports with higher centrality, indicating their significance in international trade and logistics.
  • Figure 3: SAGE-Derived Feature Importance for Central Port Classification. This bar chart illustrates the global feature importance scores computed using the SAGE method for identifying the top $10\%$ most central ports. Features such as CARGODEPTH, LONGITUDE, HARBORSIZE, and MED_FACIL exhibit the highest contributions, underscoring their relevance in the classification model. The error bars in the image denote the variability in importance across multiple runs.
  • Figure 4: ROC curve from the random forest for the binary classification of most central ports. From left to right, we consider relevant ports, the ones on the $5\%$, $10\%$, $15\%$
  • Figure 5: Partial dependence plots for the $3$ most important features according to SAGE, CARGODEPTH (top) and HARBOURSIZE (bottom). The vertical gray bar represents the average value of the feature. The blue partial dependence plot line is the average value of the model output when we fix the feature at hand to a given value. The grey histograms on the x-axis indicate the distribution of each feature.
  • ...and 1 more figures