Table of Contents
Fetching ...

Scalable Multi-Objective Reinforcement Learning with Fairness Guarantees using Lorenz Dominance

Dimitris Michailidis, Willem Röpke, Diederik M. Roijers, Sennay Ghebreab, Fernando P. Santos

TL;DR

This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems, and proposes using Lorenz dominance to identify policies with equitable reward distributions and introducing {\lambda}-Lorenz dominance to enable flexible fairness preferences.

Abstract

Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives. MORL is computationally more complex than single-objective RL, particularly as the number of objectives increases. Additionally, when objectives involve the preferences of agents or groups, ensuring fairness is socially desirable. This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems. We propose using Lorenz dominance to identify policies with equitable reward distributions and introduce λ-Lorenz dominance to enable flexible fairness preferences. We release a new, large-scale real-world transport planning environment and demonstrate that our method encourages the discovery of fair policies, showing improved scalability in two large cities (Xi'an and Amsterdam). Our methods outperform common multi-objective approaches, particularly in high-dimensional objective spaces.

Scalable Multi-Objective Reinforcement Learning with Fairness Guarantees using Lorenz Dominance

TL;DR

This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems, and proposes using Lorenz dominance to identify policies with equitable reward distributions and introducing {\lambda}-Lorenz dominance to enable flexible fairness preferences.

Abstract

Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives. MORL is computationally more complex than single-objective RL, particularly as the number of objectives increases. Additionally, when objectives involve the preferences of agents or groups, ensuring fairness is socially desirable. This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems. We propose using Lorenz dominance to identify policies with equitable reward distributions and introduce λ-Lorenz dominance to enable flexible fairness preferences. We release a new, large-scale real-world transport planning environment and demonstrate that our method encourages the discovery of fair policies, showing improved scalability in two large cities (Xi'an and Amsterdam). Our methods outperform common multi-objective approaches, particularly in high-dimensional objective spaces.

Paper Structure

This paper contains 26 sections, 5 theorems, 19 equations, 14 figures, 2 tables.

Key Result

Theorem 1

$\forall \lambda_1, \lambda_2: 0 \leq \lambda_1 \leq \lambda_2 \leq 1$ and $\forall D \subset \mathbb{R}^d$ the following relations hold.

Figures (14)

  • Figure 1: The Pareto and Lorenz-dominated areas of vector S. The Lorenz-dominated area includes the Pareto-dominated area, and is symmetric around the equality line, except for the symmetric vector S' = (3, 2). This creates an expanded dominance, resulting in fewer acceptable trade-offs.
  • Figure 2: Lorenz Conditioned Networks (LCNs) is a multi-policy method that offers fair trade-offs between different objectives (left). Reference points enhance training speed by steering training towards desired solution by filtering the Experience Replay buffer (center). $\lambda$-LCN introduces flexibility in fairness preferences, enabling the relaxation of fairness constraints to accommodate more diverse policies (right).
  • Figure 3: Two real-world instances of the MO-TNDP environment in Xi'an (China) wei_city_2020 and Amsterdam (Netherlands). (A) shows the aggregate Origin-Destination Demand per cell (sum of incoming and outgoing flows); (B) shows the group of each cell, based on the house price index quintiles.
  • Figure 4: (a) LCN outperforms PCN & GPI-LS across all objectives in the Sen Welfare measure (Xi'an). Additionally, LCN outperforms PCN in hypervolume when the number of objectives $> 4$ and in EUM for objectives $> 6$, showcasing its scalability over the objective space. (b) A comparison of the trained policies of the proposed LCN, LCN-Redist and LCN-Mean models.
  • Figure 5: Learning Curves for EUM on 3 and 10 objectives (curves for all objectives are in the supplementary material).
  • ...and 9 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3: $\lambda$-Lorenz dominance
  • Theorem 1
  • Lemma 2
  • proof
  • Theorem 3
  • proof
  • Lemma 4
  • proof
  • ...and 2 more