Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

Andrew Holliday; Ahmed El-Geneidy; Gregory Dudek

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

Andrew Holliday, Ahmed El-Geneidy, Gregory Dudek

TL;DR

This paper introduces Neural Evolutionary Algorithm (NEA), a DRL-based approach that learns neural heuristics to guide transit network design within an evolutionary framework. By training a graph neural network policy with PPO on a construction MDP, NEA learns to compose and extend routes in a city graph, outperforming traditional evolutionary baselines on large benchmark cities and a real-world Laval case study. The results demonstrate state-of-the-art performance on challenging Mumford benchmarks and significant cost savings in Laval, illustrating the potential of neural heuristics to yield cost-efficient, rider-supporting transit networks. Limitations include training in a construction setting and the need for broader policy diversification; future work could extend neural heuristics to more operators and multi-objective metaheuristics for real-world deployment.

Abstract

Planning a network of public transit routes is a challenging optimization problem. Metaheuristic algorithms search through the space of possible transit networks by applying heuristics that randomly alter routes in a network. Existing algorithms almost exclusively use heuristics that modify the network in purely random ways. In this work, we explore whether we can obtain better transit networks using more intelligent heuristics, that modify networks according to a learned preference function instead of at random. We use reinforcement learning to train graph neural nets to act as heuristics. These neural heuristics yield improved results on benchmark synthetic cities with 70 nodes or more, and achieve new state-of-the-art results on the challenging Mumford benchmark. They also improve upon a simulation of the real transit network in the city of Laval, Canada, achieving cost savings of up to 19% over the city's existing transit network.

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (33 sections, 25 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 25 equations, 7 figures, 7 tables, 1 algorithm.

Introduction
Background and related work
Deep Learning for Optimization Problems
Optimization of Public Transit
The Transit Network Design Problem
Cost Functions
Markov Decision Process Formulation
Neural Heuristics
Learning to Construct a Network
Training
Evolutionary Algorithm
Benchmark Experiments
Comparison with Baseline Evolutionary Algorithm
Ablation Studies
Effect of number of samples
...and 18 more sections

Figures (7)

Figure 1: An example city graph with ten numbered nodes and three routes. Link edges (representing streets or railways) are black, routes are in colour, and demands are shown by dashed red lines. The edges of the three routes form a subgraph of the link graph $(\mathcal{N}, \mathcal{E}_s)$. All node-pairs are connected by this subgraph, so the three routes form a valid transit network. The demand between nodes 2 and 5 and between 0 and 6 can be satisfied directly by riding on the blue line, and the demand from 7 to 4 by the orange line. The demand from 3 to 9 requires passengers to ride the orange line from node 3 to 8, and transfer to the green line to go from 8 to 9.
Figure 2: A flowchart of the transit network construction process defined by our MDP. Blue boxes indicate points where the timestep $t$ is incremented and the agent selects an action. Red nodes are the beginning and ending of the process. Green nodes are hard-coded decision points. Orange nodes show updates to the state $s$ and action space $\mathcal{A}$.
Figure 3: Trade-offs between average trip time $C_p$ (on the x-axis) and total route time $C_o$ (on the y-axis), across values of $\alpha$ over the range $[0, 1]$ in increments of $0.1$, from $\alpha=0.0$ at lower-right to $1.0$ at upper-left, for transit networks from LC-100, LC-40k, EA, and NEA. Each point shows the mean $C_p$ and $C_o$ over 10 random seeds for one value of $\alpha$, and bars around each point indicate one standard deviation on each axis. Lines linking pairs of points indicate that they represent consecutive $\alpha$ values. Lower values of $C_o$ and $C_p$ are better, so the down-and-leftward direction in each plot represents improvement.
Figure 4: Trade-offs between average trip time $C_p$ (on the x-axis) and total route time $C_o$ (on the y-axis) achieved by all-1 NEA, plotted along with NEA (repeated from \ref{['fig:40k']}) for comparison.
Figure 5: Trade-offs between average trip time $C_p$ (on the x-axis) and total route time $C_o$ (on the y-axis) achieved by RC-EA, plotted along with NEA (repeated from \ref{['fig:40k']}) for comparison.
...and 2 more figures

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

TL;DR

Abstract

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)