Table of Contents
Fetching ...

Applications of deep reinforcement learning to urban transit network design

Andrew Holliday

TL;DR

This work tackles the Transit Network Design Problem (TNDP), a challenging NP-hard task of designing public transit routes to satisfy travel demand while minimizing costs. It introduces a Graph Attention Network-based policy trained with deep reinforcement learning to construct transit networks and demonstrates how this policy can be used both directly for planning and as an initialization/heuristic component within metaheuristics. The study shows that combining neural policies with evolutionary or hyper-heuristic algorithms yields networks that outperform purely neural or purely metaheuristic approaches on standard benchmarks (Mandl, Mumford) and a large real-world-like Laval instance, achieving lower operating costs and shorter passenger trips. The findings highlight the practical potential of learned heuristics to accelerate high-quality TNDP solutions, while outlining key challenges and future directions for real-city deployment and integration with scheduling and demand dynamics.

Abstract

This thesis concerns the use of reinforcement learning to train neural networks to aid in the design of public transit networks. The Transit Network Design Problem (TNDP) is an optimization problem of considerable practical importance. Given a city with an existing road network and travel demands, the goal is to find a set of transit routes - each of which is a path through the graph - that collectively satisfy all demands, while minimizing a cost function that may depend both on passenger satisfaction and operating costs. The existing literature on this problem mainly considers metaheuristic optimization algorithms, such as genetic algorithms and ant-colony optimization. By contrast, we begin by taking a reinforcement learning approach, formulating the construction of a set of transit routes as a Markov Decision Process (MDP) and training a neural net policy to act as the agent in this MDP. We then show that, beyond using this policy to plan a transit network directly, it can be combined with existing metaheuristic algorithms, both to initialize the solution and to suggest promising moves at each step of a search through solution space. We find that such hybrid algorithms, which use a neural policy trained via reinforcement learning as a core component within a classical metaheuristic framework, can plan transit networks that are superior to those planned by either the neural policy or the metaheuristic algorithm. We demonstrate the utility of our approach by using it to redesign the transit network for the city of Laval, Quebec, and show that in simulation, the resulting transit network provides better service at lower cost than the existing transit network.

Applications of deep reinforcement learning to urban transit network design

TL;DR

This work tackles the Transit Network Design Problem (TNDP), a challenging NP-hard task of designing public transit routes to satisfy travel demand while minimizing costs. It introduces a Graph Attention Network-based policy trained with deep reinforcement learning to construct transit networks and demonstrates how this policy can be used both directly for planning and as an initialization/heuristic component within metaheuristics. The study shows that combining neural policies with evolutionary or hyper-heuristic algorithms yields networks that outperform purely neural or purely metaheuristic approaches on standard benchmarks (Mandl, Mumford) and a large real-world-like Laval instance, achieving lower operating costs and shorter passenger trips. The findings highlight the practical potential of learned heuristics to accelerate high-quality TNDP solutions, while outlining key challenges and future directions for real-city deployment and integration with scheduling and demand dynamics.

Abstract

This thesis concerns the use of reinforcement learning to train neural networks to aid in the design of public transit networks. The Transit Network Design Problem (TNDP) is an optimization problem of considerable practical importance. Given a city with an existing road network and travel demands, the goal is to find a set of transit routes - each of which is a path through the graph - that collectively satisfy all demands, while minimizing a cost function that may depend both on passenger satisfaction and operating costs. The existing literature on this problem mainly considers metaheuristic optimization algorithms, such as genetic algorithms and ant-colony optimization. By contrast, we begin by taking a reinforcement learning approach, formulating the construction of a set of transit routes as a Markov Decision Process (MDP) and training a neural net policy to act as the agent in this MDP. We then show that, beyond using this policy to plan a transit network directly, it can be combined with existing metaheuristic algorithms, both to initialize the solution and to suggest promising moves at each step of a search through solution space. We find that such hybrid algorithms, which use a neural policy trained via reinforcement learning as a core component within a classical metaheuristic framework, can plan transit networks that are superior to those planned by either the neural policy or the metaheuristic algorithm. We demonstrate the utility of our approach by using it to redesign the transit network for the city of Laval, Quebec, and show that in simulation, the resulting transit network provides better service at lower cost than the existing transit network.

Paper Structure

This paper contains 74 sections, 35 equations, 34 figures, 10 tables, 1 algorithm.

Figures (34)

  • Figure 1: Robert Moses (left moses_image) and Charles-Édouard Jeanneret, also known as Le Corbusier (right corbusier_image), two influential figures in the development of automobile-centric cities.
  • Figure 2: Automobile congestion causing travel delays in the city of Toronto, where it is a commonplace occurrence ctv2020congestion.
  • Figure 3: A schematic of the message-passing step of a layer for one node in a graph. As described in \ref{['eqn:message_passing']}, node $i$ receives messages $\mathbf{h}^j$ from each neighbouring node and from itself, which are the node's feature transformed by the learned matrix $M$ and scaled by the coefficients $\alpha_{ij}$. The received messages are summed to form the layer's new embedding of node $i$.
  • Figure 4: An example city graph with ten numbered nodes and three routes. Street edges are black, routes are in colour, and two example demands are shown by dashed red lines. The edges of the three routes form a sub-graph of the street graph $(\mathcal{N}, \mathcal{E}_s)$. All nodes are connected by this sub graph, so the three routes form a valid transit network. The demand between nodes 2 and 5, 0 and 6, and 7 and 4 can be satisfied directly by riding on the blue line, while the demand from 3 to 9 requires one transfer: passengers must ride the orange line from node 3 to 8, and then the green line from node 8 to 9.
  • Figure 5: Overview of thesis research
  • ...and 29 more figures