Table of Contents
Fetching ...

Remember, but also, Forget: Bridging Myopic and Perfect Recall Fairness with Past-Discounting

Ashwin Kumar, William Yeoh

TL;DR

The paper addresses fairness in dynamic, multi-agent resource allocation by introducing past-discounted historical utilities. It defines a tunable decay factor $\gamma_p$ to interpolate between instantaneous and perfect-recall fairness, and shows that this approach yields a bounded augmented state space, enhancing computational tractability for sequential decision-making. Formulations are provided for both additive and averaged utilities, with theoretical justification and comparative analysis against traditional paradigms. The work aims to better align fairness with human temporal judgments while enabling scalable reinforcement learning in dynamic settings.

Abstract

Dynamic resource allocation in multi-agent settings often requires balancing efficiency with fairness over time--a challenge inadequately addressed by conventional, myopic fairness measures. Motivated by behavioral insights that human judgments of fairness evolve with temporal distance, we introduce a novel framework for temporal fairness that incorporates past-discounting mechanisms. By applying a tunable discount factor to historical utilities, our approach interpolates between instantaneous and perfect-recall fairness, thereby capturing both immediate outcomes and long-term equity considerations. Beyond aligning more closely with human perceptions of fairness, this past-discounting method ensures that the augmented state space remains bounded, significantly improving computational tractability in sequential decision-making settings. We detail the formulation of discounted-recall fairness in both additive and averaged utility contexts, illustrate its benefits through practical examples, and discuss its implications for designing balanced, scalable resource allocation strategies.

Remember, but also, Forget: Bridging Myopic and Perfect Recall Fairness with Past-Discounting

TL;DR

The paper addresses fairness in dynamic, multi-agent resource allocation by introducing past-discounted historical utilities. It defines a tunable decay factor to interpolate between instantaneous and perfect-recall fairness, and shows that this approach yields a bounded augmented state space, enhancing computational tractability for sequential decision-making. Formulations are provided for both additive and averaged utilities, with theoretical justification and comparative analysis against traditional paradigms. The work aims to better align fairness with human temporal judgments while enabling scalable reinforcement learning in dynamic settings.

Abstract

Dynamic resource allocation in multi-agent settings often requires balancing efficiency with fairness over time--a challenge inadequately addressed by conventional, myopic fairness measures. Motivated by behavioral insights that human judgments of fairness evolve with temporal distance, we introduce a novel framework for temporal fairness that incorporates past-discounting mechanisms. By applying a tunable discount factor to historical utilities, our approach interpolates between instantaneous and perfect-recall fairness, thereby capturing both immediate outcomes and long-term equity considerations. Beyond aligning more closely with human perceptions of fairness, this past-discounting method ensures that the augmented state space remains bounded, significantly improving computational tractability in sequential decision-making settings. We detail the formulation of discounted-recall fairness in both additive and averaged utility contexts, illustrate its benefits through practical examples, and discuss its implications for designing balanced, scalable resource allocation strategies.

Paper Structure

This paper contains 12 sections, 12 equations, 1 figure.

Figures (1)

  • Figure 1: Comparison of cumulative utility differences under different fairness paradigms. (Left) Cumulative utility difference, $\sum U_{Alice} - \sum U_{Bob}$, over time for Example \ref{['ex:inst_example']}, where both agents participate from the start. (Center) Cumulative utility difference, $\sum U_{Alice} - \sum U_{Bob}$, over time for Example \ref{['ex:recall_example']}, where only Alice is active initially and Bob joins later. (Right) Difference in perceived utility between Alice and Bob for all three methods for Example \ref{['ex:recall_example']}. This plot shows the effect of $\gamma_p$ on the perceived values, demonstrating how it changes the speed at which we forget past decisions, interpolating between perfect-recall and instantaneous fairness.