Table of Contents
Fetching ...

Context-Aware Automated Passenger Counting Data Denoising

Noëlie Cherrier, Baptiste Rérolle, Martin Graive, Amir Dib, Eglantine Schmitt

TL;DR

This work tackles the challenge of noisy APC data for onboard occupancy estimation by proposing a context-aware denoising method framed as a constrained integer linear optimization. It integrates ticketing data and historical priors through a three-stage optimization that first removes outliers, then aligns denoised counts with observations, and finally selects solutions closest to prior distributions. The approach yields robust occupancy estimates across real and simulated networks, offering improved reliability over baselines and acceptable computation times, with potential applicability to downstream tasks like O/D reconstruction. The results demonstrate that incorporating ticketing and historical priors enhances the consistency and interpretability of APC-derived ridership insights, supporting faster, data-driven decisions for transit operators.

Abstract

A reliable and accurate knowledge of the ridership in public transportation networks is crucial for public transport operators and public authorities to be aware of their network's use and optimize transport offering. Several techniques to estimate ridership exist nowadays, some of them in an automated manner. Among them, Automatic Passenger Counting (APC) systems detect passengers entering and leaving the vehicle at each station of its course. However, data resulting from these systems are often noisy or even biased, resulting in under or overestimation of onboard occupancy. In this work, we propose a denoising algorithm for APC data to improve their robustness and ease their analyzes. The proposed approach consists in a constrained integer linear optimization, taking advantage of ticketing data and historical ridership data to further constrain and guide the optimization. The performances are assessed and compared to other denoising methods on several public transportation networks in France, to manual counts available on one of these networks, and on simulated data.

Context-Aware Automated Passenger Counting Data Denoising

TL;DR

This work tackles the challenge of noisy APC data for onboard occupancy estimation by proposing a context-aware denoising method framed as a constrained integer linear optimization. It integrates ticketing data and historical priors through a three-stage optimization that first removes outliers, then aligns denoised counts with observations, and finally selects solutions closest to prior distributions. The approach yields robust occupancy estimates across real and simulated networks, offering improved reliability over baselines and acceptable computation times, with potential applicability to downstream tasks like O/D reconstruction. The results demonstrate that incorporating ticketing and historical priors enhances the consistency and interpretability of APC-derived ridership insights, supporting faster, data-driven decisions for transit operators.

Abstract

A reliable and accurate knowledge of the ridership in public transportation networks is crucial for public transport operators and public authorities to be aware of their network's use and optimize transport offering. Several techniques to estimate ridership exist nowadays, some of them in an automated manner. Among them, Automatic Passenger Counting (APC) systems detect passengers entering and leaving the vehicle at each station of its course. However, data resulting from these systems are often noisy or even biased, resulting in under or overestimation of onboard occupancy. In this work, we propose a denoising algorithm for APC data to improve their robustness and ease their analyzes. The proposed approach consists in a constrained integer linear optimization, taking advantage of ticketing data and historical ridership data to further constrain and guide the optimization. The performances are assessed and compared to other denoising methods on several public transportation networks in France, to manual counts available on one of these networks, and on simulated data.
Paper Structure (20 sections, 5 equations, 9 figures, 4 tables)

This paper contains 20 sections, 5 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: A course at Angers network. Due to biases in the counting cells measurements, occupancy computed from counting cells data is lower than the one computed from occupancy. It is even negative at the middle of the race which is not possible.
  • Figure 2: Triangular similarity function $H_i$. The function is characterized by the following two parameters: $x_i^\text{obs}$ the value of the observed count, and $\alpha$ the half margin.
  • Figure 3: Step I: Maximize the minimal similarity between denoised counts and observed counts. On solution A, alighting 2 is the count with the maximal difference between the observed and denoised count. On solution B, it is the alighting 3. Solution A is less optimal than solution B because it has a smaller minimal similarity between observed and denoised counts.
  • Figure 4: Stage II: Maximize the sum of similarities between denoised and observed counts. Between solutions A and B, only denoised boarding counts differ as denoised alighting counts are the same. Since the sum of similarities between denoised and observed counts is higher for solution B, solution A is less optimal than solution B.
  • Figure 5: Example of priors for a given line and direction. 70% of passengers board at station 1 in average. 30% of passengers alight at station 3 in average.
  • ...and 4 more figures