Table of Contents
Fetching ...

Congestion Forecast for Trains with Railroad-Graph-based Semi-Supervised Learning using Sparse Passenger Reports

Soto Anno, Kota Tsubouchi, Masamichi Shimosaka

TL;DR

A semi-supervised method for congestion forecasting for trains, or SURCONFORT, is proposed, which adopt semi-supervised learning to leverage sparsely labeled data and many unlabeled data and design a railway network-oriented graph and apply the graph to semi-supervised graph regularization.

Abstract

Forecasting rail congestion is crucial for efficient mobility in transport systems. We present rail congestion forecasting using reports from passengers collected through a transit application. Although reports from passengers have received attention from researchers, ensuring a sufficient volume of reports is challenging due to passenger's reluctance. The limited number of reports results in the sparsity of the congestion label, which can be an issue in building a stable prediction model. To address this issue, we propose a semi-supervised method for congestion forecasting for trains, or SURCONFORT. Our key idea is twofold: firstly, we adopt semi-supervised learning to leverage sparsely labeled data and many unlabeled data. Secondly, in order to complement the unlabeled data from nearby stations, we design a railway network-oriented graph and apply the graph to semi-supervised graph regularization. Empirical experiments with actual reporting data show that SURCONFORT improved the forecasting performance by 14.9% over state-of-the-art methods under the label sparsity.

Congestion Forecast for Trains with Railroad-Graph-based Semi-Supervised Learning using Sparse Passenger Reports

TL;DR

A semi-supervised method for congestion forecasting for trains, or SURCONFORT, is proposed, which adopt semi-supervised learning to leverage sparsely labeled data and many unlabeled data and design a railway network-oriented graph and apply the graph to semi-supervised graph regularization.

Abstract

Forecasting rail congestion is crucial for efficient mobility in transport systems. We present rail congestion forecasting using reports from passengers collected through a transit application. Although reports from passengers have received attention from researchers, ensuring a sufficient volume of reports is challenging due to passenger's reluctance. The limited number of reports results in the sparsity of the congestion label, which can be an issue in building a stable prediction model. To address this issue, we propose a semi-supervised method for congestion forecasting for trains, or SURCONFORT. Our key idea is twofold: firstly, we adopt semi-supervised learning to leverage sparsely labeled data and many unlabeled data. Secondly, in order to complement the unlabeled data from nearby stations, we design a railway network-oriented graph and apply the graph to semi-supervised graph regularization. Empirical experiments with actual reporting data show that SURCONFORT improved the forecasting performance by 14.9% over state-of-the-art methods under the label sparsity.

Paper Structure

This paper contains 31 sections, 7 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: (a) Transit guidance screen and congestion status posting screen of LY Corporation Transit Navigation. App users post the status of congestion inside the trains after boarding. (b) Overview of our problem and SURCONFORT in which the key idea is two fold: semi-supervised learning and a railroad network-oriented graph.
  • Figure 2: Conceptual illustration of our problem (left) and the descriptor space in an ideal state (right). The shape of the data point represents the sample congestion at each adjacent station (circle for station A, star for station B, triangle for station C), and the three colors represent the level of congestion at each station (blue for congestion level 1, orange for 2, red for 3). Samples missing labels in the UGC data are in gray, and the actual congestion level is reflected in the border's color. We learn the degree of congestion based on the UGC data associated with each station and build an ideal descriptor space (i.e., a cluster of descriptors for each congestion level) on the predictive model.
  • Figure 3: Conceptual illustration of descriptor space formed by fully-supervised methods (left) and LP-DeepSS iscen:cvpr2019 (right). Fully-supervised methods extract and learn limited patterns from a small number of data and thus suffer from overfitting, where some of the samples belonging to the same congestion level form clusters for each station or context. LP-DeepSSL, which is built on this basis, is prone to assigning wrong pseudo-labels (represented by unidirectional arrows) when performing label propagation on the descriptor space.
  • Figure 4: Conceptual illustration of SURCONFORT, which corrects the internal representation of the feature extractor by using graph regularization to create an ideal descriptor space. The graph regularization is based on a rail network-oriented graph so that the descriptors of adjacent stations are close to each other in the descriptor space.
  • Figure 5: Prediction performance w.r.t. $\zeta_{\mathrm{G}}$.
  • ...and 2 more figures