Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning
Enrique Adrian Villarrubia-Martin, Luis Rodriguez-Benitez, David Muñoz-Valero, Giovanni Montana, Luis Jimenez-Linares
TL;DR
The paper tackles dynamic pricing in high-speed railways by formulating a mixed cooperative-competitive MARL problem within a non-zero-sum Markov game. It introduces RailPricing-RL, a journey-based simulator extending ROBIN to model multi-operator journeys and passenger choices via random utility models, enabling realistic training of MARL agents. Through experiments with TD3, SAC, IQL-SAC, VDN-SAC, MAAC, and MADDPG, the study demonstrates that attention mechanisms can help MAAC handle heterogeneous passenger demand, while shared-reward methods like VDN-SAC achieve higher equity at practical limitations. The findings highlight trade-offs between profitability, passenger utility, and equity, and point to future directions such as richer network topologies, fairness-aware rewards, and advanced value-decomposition techniques to better capture asymmetric operator influence.
Abstract
This paper addresses a critical challenge in the high-speed passenger railway industry: designing effective dynamic pricing strategies in the context of competing and cooperating operators. To address this, a multi-agent reinforcement learning (MARL) framework based on a non-zero-sum Markov game is proposed, incorporating random utility models to capture passenger decision making. Unlike prior studies in areas such as energy, airlines, and mobile networks, dynamic pricing for railway systems using deep reinforcement learning has received limited attention. A key contribution of this paper is a parametrisable and versatile reinforcement learning simulator designed to model a variety of railway network configurations and demand patterns while enabling realistic, microscopic modelling of user behaviour, called RailPricing-RL. This environment supports the proposed MARL framework, which models heterogeneous agents competing to maximise individual profits while fostering cooperative behaviour to synchronise connecting services. Experimental results validate the framework, demonstrating how user preferences affect MARL performance and how pricing policies influence passenger choices, utility, and overall system dynamics. This study provides a foundation for advancing dynamic pricing strategies in railway systems, aligning profitability with system-wide efficiency, and supporting future research on optimising pricing policies.
