Table of Contents
Fetching ...

Calibration and Evaluation of Car-Following Models for Autonomous Shuttles Using a Novel Multi-Criteria Framework

Renan Favero, Lily Elefteriadou

TL;DR

This paper addresses the need for AS-specific car-following models and a standardized evaluation framework by calibrating a diverse set of models on field AS trajectory data and introducing a multi-criteria assessment. It finds that tree-based ML models, particularly XGBoost, offer the best overall performance, while sequential models capture long-term stability but lag in short-term responsiveness. A three-dimensional evaluation framework—error prediction, trajectory stability, and trajectory similarity—reveals trade-offs among model types, with ML approaches generally outperforming traditional IDM/ACC baselines. The framework provides a transferable method for practitioners to select models that balance accuracy, stability, and realism, enabling more faithful AS simulations and informed deployment decisions in urban settings. These results have practical implications for predicting AS impacts on capacity, stability, and safety before large-scale deployment.

Abstract

Autonomous shuttles (AS) are fully autonomous transit vehicles with operating characteristics distinct from conventional autonomous vehicles (AV). Developing dedicated car-following models for AS is critical to understanding their traffic impacts; however, few studies have calibrated such models with field data. More advanced machine learning (ML) techniques have not yet been applied to AS trajectories, leaving the potential of ML for capturing AS dynamics unexplored and constraining the development of dedicated AS models. Furthermore, there is a lack of a unified framework for systematically evaluating and comparing the performance of car-following models to replicate real trajectories. Existing car-following studies often rely on disparate metrics, which limit reproducibility and performance comparability. This study addresses these gaps through two main contributions: (1) the calibration of a diverse set of car-following models using real-world AS trajectory data, including eight machine learning algorithms and two physics-based models; and (2) the introduction of a multi-criteria evaluation framework that integrates measures of prediction accuracy, trajectory stability, and statistical similarity, which provides a generalizable methodology for a systematic assessment of car-following models. Results indicated that the proposed calibrated XGBoost model achieved the best overall performance. Sequential model type, such as LSTM and CNN, captured long-term positional stability but were less responsive to short-term dynamics. LSTM and CNN captured long-term positional stability but were less responsive to short-term dynamics. Traditional models (IDM, ACC) and kernel methods showed lower accuracy and stability than most ML models tested.

Calibration and Evaluation of Car-Following Models for Autonomous Shuttles Using a Novel Multi-Criteria Framework

TL;DR

This paper addresses the need for AS-specific car-following models and a standardized evaluation framework by calibrating a diverse set of models on field AS trajectory data and introducing a multi-criteria assessment. It finds that tree-based ML models, particularly XGBoost, offer the best overall performance, while sequential models capture long-term stability but lag in short-term responsiveness. A three-dimensional evaluation framework—error prediction, trajectory stability, and trajectory similarity—reveals trade-offs among model types, with ML approaches generally outperforming traditional IDM/ACC baselines. The framework provides a transferable method for practitioners to select models that balance accuracy, stability, and realism, enabling more faithful AS simulations and informed deployment decisions in urban settings. These results have practical implications for predicting AS impacts on capacity, stability, and safety before large-scale deployment.

Abstract

Autonomous shuttles (AS) are fully autonomous transit vehicles with operating characteristics distinct from conventional autonomous vehicles (AV). Developing dedicated car-following models for AS is critical to understanding their traffic impacts; however, few studies have calibrated such models with field data. More advanced machine learning (ML) techniques have not yet been applied to AS trajectories, leaving the potential of ML for capturing AS dynamics unexplored and constraining the development of dedicated AS models. Furthermore, there is a lack of a unified framework for systematically evaluating and comparing the performance of car-following models to replicate real trajectories. Existing car-following studies often rely on disparate metrics, which limit reproducibility and performance comparability. This study addresses these gaps through two main contributions: (1) the calibration of a diverse set of car-following models using real-world AS trajectory data, including eight machine learning algorithms and two physics-based models; and (2) the introduction of a multi-criteria evaluation framework that integrates measures of prediction accuracy, trajectory stability, and statistical similarity, which provides a generalizable methodology for a systematic assessment of car-following models. Results indicated that the proposed calibrated XGBoost model achieved the best overall performance. Sequential model type, such as LSTM and CNN, captured long-term positional stability but were less responsive to short-term dynamics. LSTM and CNN captured long-term positional stability but were less responsive to short-term dynamics. Traditional models (IDM, ACC) and kernel methods showed lower accuracy and stability than most ML models tested.
Paper Structure (23 sections, 1 equation, 3 figures, 2 tables)

This paper contains 23 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Metrics (non-normalized) calculated to obtain the error, trajectory stability, and trajectory similarity scores.
  • Figure 2: Aggregated Z-scores across evaluation dimensions.
  • Figure 3: Aggregated Z-Scores for all models. Negative scores indicate better performance.