Table of Contents
Fetching ...

Long-term Fairness in Ride-Hailing Platform

Yufan Kang, Jeffrey Chan, Wei Shao, Flora D. Salim, Christopher Leckie

TL;DR

This work tackles long-term fairness in ride-hailing by modeling the matching problem as a Markov Decision Process that balances total utility and earnings equity across drivers. It integrates a time-series forecasting module to predict future ride requests and embeds this forecast into a centralised multi-objective multi-agent Q-learning framework with a custom scalarisation function to trade off efficiency and fairness. The objective is formalised as $\max_M \ \uppi(M) - \lambda F(M)$ with $F(M)=\operatorname{Var}(o^{t_n}_v(M(v)))$, facilitating non-myopic, look-ahead decisions over a weekly horizon. Experiments on the New York City Taxi dataset demonstrate improved long-term fairness and stability over baselines, validating the approach for real-world deployment.

Abstract

Matching in two-sided markets such as ride-hailing has recently received significant attention. However, existing studies on ride-hailing mainly focus on optimising efficiency, and fairness issues in ride-hailing have been neglected. Fairness issues in ride-hailing, including significant earning differences between drivers and variance of passenger waiting times among different locations, have potential impacts on economic and ethical aspects. The recent studies that focus on fairness in ride-hailing exploit traditional optimisation methods and the Markov Decision Process to balance efficiency and fairness. However, there are several issues in these existing studies, such as myopic short-term decision-making from traditional optimisation and instability of fairness in a comparably longer horizon from both traditional optimisation and Markov Decision Process-based methods. To address these issues, we propose a dynamic Markov Decision Process model to alleviate fairness issues currently faced by ride-hailing, and seek a balance between efficiency and fairness, with two distinct characteristics: (i) a prediction module to predict the number of requests that will be raised in the future from different locations to allow the proposed method to consider long-term fairness based on the whole timeline instead of consider fairness only based on historical and current data patterns; (ii) a customised scalarisation function for multi-objective multi-agent Q Learning that aims to balance efficiency and fairness. Extensive experiments on a publicly available real-world dataset demonstrate that our proposed method outperforms existing state-of-the-art methods.

Long-term Fairness in Ride-Hailing Platform

TL;DR

This work tackles long-term fairness in ride-hailing by modeling the matching problem as a Markov Decision Process that balances total utility and earnings equity across drivers. It integrates a time-series forecasting module to predict future ride requests and embeds this forecast into a centralised multi-objective multi-agent Q-learning framework with a custom scalarisation function to trade off efficiency and fairness. The objective is formalised as with , facilitating non-myopic, look-ahead decisions over a weekly horizon. Experiments on the New York City Taxi dataset demonstrate improved long-term fairness and stability over baselines, validating the approach for real-world deployment.

Abstract

Matching in two-sided markets such as ride-hailing has recently received significant attention. However, existing studies on ride-hailing mainly focus on optimising efficiency, and fairness issues in ride-hailing have been neglected. Fairness issues in ride-hailing, including significant earning differences between drivers and variance of passenger waiting times among different locations, have potential impacts on economic and ethical aspects. The recent studies that focus on fairness in ride-hailing exploit traditional optimisation methods and the Markov Decision Process to balance efficiency and fairness. However, there are several issues in these existing studies, such as myopic short-term decision-making from traditional optimisation and instability of fairness in a comparably longer horizon from both traditional optimisation and Markov Decision Process-based methods. To address these issues, we propose a dynamic Markov Decision Process model to alleviate fairness issues currently faced by ride-hailing, and seek a balance between efficiency and fairness, with two distinct characteristics: (i) a prediction module to predict the number of requests that will be raised in the future from different locations to allow the proposed method to consider long-term fairness based on the whole timeline instead of consider fairness only based on historical and current data patterns; (ii) a customised scalarisation function for multi-objective multi-agent Q Learning that aims to balance efficiency and fairness. Extensive experiments on a publicly available real-world dataset demonstrate that our proposed method outperforms existing state-of-the-art methods.
Paper Structure (24 sections, 6 equations, 5 figures, 2 tables)

This paper contains 24 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The figure showcases the disparity in fairness concerns between short-term and long-term allocations by displaying two allocation systems generating allocation plans at each timestep. The arrows pointing to vehicles represents allocated requests with different utility to different drivers, and the dollar signs next to drivers indicate the total utility accumulated by the end of each timestep. In this example, the algorithm that prioritizes short-term fairness manages to achieve absolute fairness by the end of the second timestep, with no variance in utility among drivers. However, it becomes unfair by the third timestep. On the other hand, the algorithm that focuses on long-term fairness appears relatively unfair at the end of the second timestep but ultimately achieves absolute fairness by the third timestep.
  • Figure 2: Long-term Fairness for ride-hailing system. With time-series prediction, the predicted requests is part of the action space of the MDP-based model to allow the outputed allocation plan be based on the pattern of future requests.
  • Figure 3: Multi-objective multi-agent Q Learning. By customising the action space and scalarization function, we aim to encourage the balance between utility and fairness by utilising multi-objective multi-agent Q learning. In action space, it includes historical, current and predicted future requests to allow the proposed model trained based on the pattern of future requests. For scalarisation function, it is designed aiming to balance utility and fairness and maximise the objective.
  • Figure 4: Performance of baselines and proposed model in terms of fairness based on gradually increased time horizon
  • Figure 5: Ablation study. Performance of the proposed model without different modules in terms of fairness and gradually increased time horizon, where the time horizon is increased by a number of days. For fairness, the larger value indicates the model is unfairer.