Review of Passenger Flow Modelling Approaches Based on a Bibliometric Analysis
Jonathan Hecht, Weilian Li, Ziyue Li, Youness Dehbi
TL;DR
This study provides the first comprehensive bibliometric analysis of short-term passenger flow forecasting in local public transit (1984–2024), combining performance analysis, science mapping, and BERTopic-based topic modelling. It reveals a post-2008 surge in research, a shift from traditional statistical and basic ML methods (e.g., ARIMA, SVM) toward specialised deep learning architectures (LSTM, CNN, GCN), and a strong concentration of rail-focused work in China. The analysis identifies gaps in data fusion, open multivariate data, interpretability, and deployment practicality, while highlighting increasing connections to foundation-model research. The findings suggest future work will be shaped by large-scale pre-training and fine-tuning for transit-specific tasks, balanced against concerns about data availability, cost, and real-world applicability.
Abstract
This paper presents a bibliometric analysis of the field of short-term passenger flow forecasting within local public transit, covering 814 publications that span from 1984 to 2024. In addition to common bibliometric analysis tools, a variant of a citation network was developed, and topic modelling was conducted. The analysis reveals that research activity exhibited sporadic patterns prior to 2008, followed by a marked acceleration, characterised by a shift from conventional statistical and machine learning methodologies (e.g., ARIMA, SVM, and basic neural networks) to specialised deep learning architectures. Based on this insight, a connection to more general fields such as machine learning and time series modelling was established. In addition to modelling, spatial, linguistic, and modal biases were identified and findings from existing secondary literature were validated and quantified. This revealed existing gaps, such as constrained data fusion, open (multivariate) data, and underappreciated challenges related to model interpretability, cost-efficiency, and a balance between algorithmic performance and practical deployment considerations. In connection with the superordinate fields, the growth in relevance of foundation models is also noteworthy.
