Table of Contents
Fetching ...

Review of Passenger Flow Modelling Approaches Based on a Bibliometric Analysis

Jonathan Hecht, Weilian Li, Ziyue Li, Youness Dehbi

TL;DR

This study provides the first comprehensive bibliometric analysis of short-term passenger flow forecasting in local public transit (1984–2024), combining performance analysis, science mapping, and BERTopic-based topic modelling. It reveals a post-2008 surge in research, a shift from traditional statistical and basic ML methods (e.g., ARIMA, SVM) toward specialised deep learning architectures (LSTM, CNN, GCN), and a strong concentration of rail-focused work in China. The analysis identifies gaps in data fusion, open multivariate data, interpretability, and deployment practicality, while highlighting increasing connections to foundation-model research. The findings suggest future work will be shaped by large-scale pre-training and fine-tuning for transit-specific tasks, balanced against concerns about data availability, cost, and real-world applicability.

Abstract

This paper presents a bibliometric analysis of the field of short-term passenger flow forecasting within local public transit, covering 814 publications that span from 1984 to 2024. In addition to common bibliometric analysis tools, a variant of a citation network was developed, and topic modelling was conducted. The analysis reveals that research activity exhibited sporadic patterns prior to 2008, followed by a marked acceleration, characterised by a shift from conventional statistical and machine learning methodologies (e.g., ARIMA, SVM, and basic neural networks) to specialised deep learning architectures. Based on this insight, a connection to more general fields such as machine learning and time series modelling was established. In addition to modelling, spatial, linguistic, and modal biases were identified and findings from existing secondary literature were validated and quantified. This revealed existing gaps, such as constrained data fusion, open (multivariate) data, and underappreciated challenges related to model interpretability, cost-efficiency, and a balance between algorithmic performance and practical deployment considerations. In connection with the superordinate fields, the growth in relevance of foundation models is also noteworthy.

Review of Passenger Flow Modelling Approaches Based on a Bibliometric Analysis

TL;DR

This study provides the first comprehensive bibliometric analysis of short-term passenger flow forecasting in local public transit (1984–2024), combining performance analysis, science mapping, and BERTopic-based topic modelling. It reveals a post-2008 surge in research, a shift from traditional statistical and basic ML methods (e.g., ARIMA, SVM) toward specialised deep learning architectures (LSTM, CNN, GCN), and a strong concentration of rail-focused work in China. The analysis identifies gaps in data fusion, open multivariate data, interpretability, and deployment practicality, while highlighting increasing connections to foundation-model research. The findings suggest future work will be shaped by large-scale pre-training and fine-tuning for transit-specific tasks, balanced against concerns about data availability, cost, and real-world applicability.

Abstract

This paper presents a bibliometric analysis of the field of short-term passenger flow forecasting within local public transit, covering 814 publications that span from 1984 to 2024. In addition to common bibliometric analysis tools, a variant of a citation network was developed, and topic modelling was conducted. The analysis reveals that research activity exhibited sporadic patterns prior to 2008, followed by a marked acceleration, characterised by a shift from conventional statistical and machine learning methodologies (e.g., ARIMA, SVM, and basic neural networks) to specialised deep learning architectures. Based on this insight, a connection to more general fields such as machine learning and time series modelling was established. In addition to modelling, spatial, linguistic, and modal biases were identified and findings from existing secondary literature were validated and quantified. This revealed existing gaps, such as constrained data fusion, open (multivariate) data, and underappreciated challenges related to model interpretability, cost-efficiency, and a balance between algorithmic performance and practical deployment considerations. In connection with the superordinate fields, the growth in relevance of foundation models is also noteworthy.

Paper Structure

This paper contains 15 sections, 1 equation, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Illustration of the approach used in this study. The steps taken begin with data collection in combination with pre-processing, followed by various forms of analysis techniques, which lead to results that allow conclusions to be drawn about passenger flow modelling.
  • Figure 2: Scientific production analysis. (a) Productivity of research constituents by publication type. Until 2008 a small total number of publications were published. Between 2008 and 2017, the type of publications increased, whereas the publication type remained comparable. From 2017 until today, the total number has increased faster with a higher number of articles published. Furthermore, the first book chapters appeared. (b) Country of scientific production based on the corresponding author. With 72%, China is at a distance the country with the highest scientific production, followed by the USA with 4% and India with 2%. It has to be noted that 11.1% of the corresponding authors are missing, which leads to a skewed depiction.
  • Figure 3: Top ten sources by number of publications. It becomes clear that there is no main publication channel. The figure does not sufficiently illustrate the importance of conferences, as these are each evaluated individually. As an example, the International Conference of Transportation Professionals (CICTP) has cumulatively contributed 35 publications throughout the years, with a peak of seven publications in 2019.
  • Figure 4: Utilizing the embeddings of abstracts generated through the "MiniLM-L6-v2" model within the SentenceTransformers framework, eleven distinct clusters have been identified. Within the documents associated with a specific cluster, some documents lack a clear association with any cluster and are thus categorised as ambiguous (grey). The topic representation, facilitated by the "KeyBERTInspired function," illustrates various modes and used methods.
  • Figure 5: Clustered co-occurrence patterns via a Biterm Topic Model (BTM). Each cluster is related to some content topic. A higher percentage depicts a higher probability for such a word combination in the corpus of documents.
  • ...and 4 more figures