Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Parand A. Alamdari; Toryn Q. Klassen; Elliot Creager; Sheila A. McIlraith

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, Sheila A. McIlraith

TL;DR

This work tackles fairness in sequential, multi-stakeholder decision making, arguing that fairness assessments depend on historical context and are inherently non-Markovian. It formalizes non-Markovian fairness through a multi-stakeholder MDP and a fairness scheme ⟨U, W_ex, B⟩, enabling timepoint-based notions like long-term, periodic, anytime, and bounded fairness. The authors propose memory augmentation and the FairQCM algorithm to convert non-Markovian fairness into a tractable, reward-like objective by generating counterfactual experiences, achieving improved sample efficiency in learning fair policies. Empirical results in doughnut allocation and simulated lending domains demonstrate that FairQCM outperforms baselines and that memory can enhance fairness without sacrificing core objectives. The framework supports policy synthesis and fairness auditing in dynamic, real-world decision processes.

Abstract

Fair decision making has largely been studied with respect to a single decision. Here we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of decisions. We observe that fairness often depends on the history of the sequential decision-making process, and in this sense that it is inherently non-Markovian. We further observe that fairness often needs to be assessed at time points within the process, not just at the end of the process. To advance our understanding of this class of fairness problems, we explore the notion of non-Markovian fairness in the context of sequential decision making. We identify properties of non-Markovian fairness, including notions of long-term, anytime, periodic, and bounded fairness. We explore the interplay between non-Markovian fairness and memory and how memory can support construction of fair policies. Finally, we introduce the FairQCM algorithm, which can automatically augment its training data to improve sample efficiency in the synthesis of fair policies via reinforcement learning.

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

TL;DR

Abstract

Paper Structure (37 sections, 3 theorems, 12 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 3 theorems, 12 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Fairness over Time
Fairness Schemes
More on the extended aggregation function
Unfairness for Individual Stakeholders
Computing Fair Policies
The Role of Memory
Counterfactual Memories for RL
Computations with Filter Functions
Experiments
Resource Allocation
Simulated Lending
Conclusion
...and 22 more sections

Key Result

Theorem 5.4

Let $\langle \langle S,s_\text{init}, A,P,R_1,\dots, R_n, \gamma \rangle,\langle U,W_\text{ex} \rangle \rangle$ be an NMFDP where $U$ is value-regular. Then there exists a memory augmentation $\langle M,m_\text{init}, \mu \rangle$ so that, in the resulting memory-augmented NMFDP, $U'$ is Markovian.

Figures (5)

Figure 1: Two processes for distributing vaccines to countries A and B. Both result in an equal distribution of vaccine at the end. Monthly evaluation shows that the first process favors A for a time.
Figure 2: Resource Allocation: In simulations of our doughnut allocation task, (deep) FairQCM achieves higher Nash welfare than competing memory-augmented RL agents (left), while learning to allocate doughnuts effectively near the end of training (right).
Figure 3: Simulated Lending: Accumulated Relaxed Demographic Parity scores for different approaches of augmenting memory during different phases of training.
Figure 4: Tabular Q-Learning for Resource Allocation: The left plot shows accumulated Nash welfare scores at the end of the episode for different approaches of augmented memory in different phases of training. The right plot shows the number of doughnuts that are not wasted during the process for each approach.
Figure 5: Simulated Lending with Gaussian Credit Score Changes: Accumulated Relaxed Demographic Parity scores for different approaches of augmenting memory during different phases of training.

Theorems & Definitions (23)

Definition 3.1: Multi-stakeholder Markov Decision Process
Definition 4.1: Long-term fairness
Definition 4.2: Periodic fairness
Definition 4.3: Anytime fairness
Definition 4.4: Bounded fairness
Definition 4.5: Fairness scheme
Definition 4.6: Non-Markovian Fair Decision Process (NMFDP)
Definition 4.7: Fairness score of a trace
Definition 4.8: Fairness score of a policy
Definition 4.9: Timepoint-first
...and 13 more

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

TL;DR

Abstract

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (23)