Table of Contents
Fetching ...

Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning

Enrique Adrian Villarrubia-Martin, Luis Rodriguez-Benitez, David Muñoz-Valero, Giovanni Montana, Luis Jimenez-Linares

TL;DR

The paper tackles dynamic pricing in high-speed railways by formulating a mixed cooperative-competitive MARL problem within a non-zero-sum Markov game. It introduces RailPricing-RL, a journey-based simulator extending ROBIN to model multi-operator journeys and passenger choices via random utility models, enabling realistic training of MARL agents. Through experiments with TD3, SAC, IQL-SAC, VDN-SAC, MAAC, and MADDPG, the study demonstrates that attention mechanisms can help MAAC handle heterogeneous passenger demand, while shared-reward methods like VDN-SAC achieve higher equity at practical limitations. The findings highlight trade-offs between profitability, passenger utility, and equity, and point to future directions such as richer network topologies, fairness-aware rewards, and advanced value-decomposition techniques to better capture asymmetric operator influence.

Abstract

This paper addresses a critical challenge in the high-speed passenger railway industry: designing effective dynamic pricing strategies in the context of competing and cooperating operators. To address this, a multi-agent reinforcement learning (MARL) framework based on a non-zero-sum Markov game is proposed, incorporating random utility models to capture passenger decision making. Unlike prior studies in areas such as energy, airlines, and mobile networks, dynamic pricing for railway systems using deep reinforcement learning has received limited attention. A key contribution of this paper is a parametrisable and versatile reinforcement learning simulator designed to model a variety of railway network configurations and demand patterns while enabling realistic, microscopic modelling of user behaviour, called RailPricing-RL. This environment supports the proposed MARL framework, which models heterogeneous agents competing to maximise individual profits while fostering cooperative behaviour to synchronise connecting services. Experimental results validate the framework, demonstrating how user preferences affect MARL performance and how pricing policies influence passenger choices, utility, and overall system dynamics. This study provides a foundation for advancing dynamic pricing strategies in railway systems, aligning profitability with system-wide efficiency, and supporting future research on optimising pricing policies.

Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning

TL;DR

The paper tackles dynamic pricing in high-speed railways by formulating a mixed cooperative-competitive MARL problem within a non-zero-sum Markov game. It introduces RailPricing-RL, a journey-based simulator extending ROBIN to model multi-operator journeys and passenger choices via random utility models, enabling realistic training of MARL agents. Through experiments with TD3, SAC, IQL-SAC, VDN-SAC, MAAC, and MADDPG, the study demonstrates that attention mechanisms can help MAAC handle heterogeneous passenger demand, while shared-reward methods like VDN-SAC achieve higher equity at practical limitations. The findings highlight trade-offs between profitability, passenger utility, and equity, and point to future directions such as richer network topologies, fairness-aware rewards, and advanced value-decomposition techniques to better capture asymmetric operator influence.

Abstract

This paper addresses a critical challenge in the high-speed passenger railway industry: designing effective dynamic pricing strategies in the context of competing and cooperating operators. To address this, a multi-agent reinforcement learning (MARL) framework based on a non-zero-sum Markov game is proposed, incorporating random utility models to capture passenger decision making. Unlike prior studies in areas such as energy, airlines, and mobile networks, dynamic pricing for railway systems using deep reinforcement learning has received limited attention. A key contribution of this paper is a parametrisable and versatile reinforcement learning simulator designed to model a variety of railway network configurations and demand patterns while enabling realistic, microscopic modelling of user behaviour, called RailPricing-RL. This environment supports the proposed MARL framework, which models heterogeneous agents competing to maximise individual profits while fostering cooperative behaviour to synchronise connecting services. Experimental results validate the framework, demonstrating how user preferences affect MARL performance and how pricing policies influence passenger choices, utility, and overall system dynamics. This study provides a foundation for advancing dynamic pricing strategies in railway systems, aligning profitability with system-wide efficiency, and supporting future research on optimising pricing policies.
Paper Structure (32 sections, 13 equations, 5 figures, 11 tables, 1 algorithm)

This paper contains 32 sections, 13 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of a railway network model, where stations are represented as nodes and connections as edges. Edges are colour-coded to indicate the companies operating services between stations. The A-B and B-C markets are operated by separate companies, but through cooperation, they can offer connecting services in the A-C market, which otherwise would have a direct service from only one company. Additionally, the blue and red companies compete directly in the A-D market. Best viewed in colour.
  • Figure 2: Architecture of the ROBIN simulator, comprising the supply module, demand module, and simulation kernel. The kernel integrates daily operator-defined services and probabilistically-generated demand across origin-destination markets to simulate high-speed railway dynamics.
  • Figure 3: Average total profits obtained at training for the Business and Business & Student scenarios by the algorithms. Best viewed in colour.
  • Figure 4: Average total profits earned by agents at training for the Business and Business & Student scenarios by the algorithms. Best viewed in colour.
  • Figure 5: Average attention entropy per agent during training for the Business and Business & Student scenarios using the MAAC algorithm. Best viewed in colour.