Fairness Incentives in Response to Unfair Dynamic Pricing

Jesse Thibodeau; Hadi Nekoei; Afaf Taïk; Janarthanan Rajendran; Golnoosh Farnadi

Fairness Incentives in Response to Unfair Dynamic Pricing

Jesse Thibodeau, Hadi Nekoei, Afaf Taïk, Janarthanan Rajendran, Golnoosh Farnadi

TL;DR

This paper addresses fairness in dynamic pricing by introducing a benevolent social planner that leverages taxation and redistribution to align firm pricing with population-based demand, thereby reducing disparities between consumer groups. It develops three policy formulations—multi-armed bandit, contextual bandit, and full reinforcement learning—to learn effective incentives, augmented by a FairReplayBuffer to ensure coverage across fairness contexts. Empirical results show that RL-based SP policies can approach or exceed analytically optimal fairness-aware baselines, with the full RL approach achieving up to a 13.19% welfare gain over the fairness-aware baseline in the simulated economy. The work demonstrates the potential of AI-driven mechanism design to improve social welfare in markets affected by dynamic pricing, while acknowledging limitations and the need for extension to competitive settings and broader fairness notions.

Abstract

The use of dynamic pricing by profit-maximizing firms gives rise to demand fairness concerns, measured by discrepancies in consumer groups' demand responses to a given pricing strategy. Notably, dynamic pricing may result in buyer distributions unreflective of those of the underlying population, which can be problematic in markets where fair representation is socially desirable. To address this, policy makers might leverage tools such as taxation and subsidy to adapt policy mechanisms dependent upon their social objective. In this paper, we explore the potential for AI methods to assist such intervention strategies. To this end, we design a basic simulated economy, wherein we introduce a dynamic social planner (SP) to generate corporate taxation schedules geared to incentivizing firms towards adopting fair pricing behaviours, and to use the collected tax budget to subsidize consumption among underrepresented groups. To cover a range of possible policy scenarios, we formulate our social planner's learning problem as a multi-armed bandit, a contextual bandit and finally as a full reinforcement learning (RL) problem, evaluating welfare outcomes from each case. To alleviate the difficulty in retaining meaningful tax rates that apply to less frequently occurring brackets, we introduce FairReplayBuffer, which ensures that our RL agent samples experiences uniformly across a discretized fairness space. We find that, upon deploying a learned tax and redistribution policy, social welfare improves on that of the fairness-agnostic baseline, and approaches that of the analytically optimal fairness-aware baseline for the multi-armed and contextual bandit settings, and surpassing it by 13.19% in the full RL setting.

Fairness Incentives in Response to Unfair Dynamic Pricing

TL;DR

Abstract

Paper Structure (34 sections, 10 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 10 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Fairness in Dynamic Pricing
Economics Foundations
Fairness in Sequential Decision Making
Problem Formulation and Methodology
Consumer Environment
From Policy Preferences to RL Design
Fixed Policy Mechanism: Multi-armed Bandit problem
Contextual Policy Mechanism: Contextual Multi-armed Bandit Problem
Evolving Policy Mechanism: RL Problem
Firms
Fairness-agnostic Firm
Fairness-aware Firm
Benevolent Social Planner
...and 19 more sections

Figures (6)

Figure 1: Firms are capable of efficiently learning profit-maximizing pricing assignments from consumer demand responses. The social planner (SP) learns these implicitly through firms' fairness and profit scores and designs incentive mechanisms that use taxation and subsidy, pushing firms to minimize the gap in demand responses between consumer groups.
Figure 2: Consumer profiles: Each firm serves two consumer groups. For each, group 1 may be considered to have lower tolerance to rising prices than group 2. The vertical purple and green lines represent the analytical profit-maximizing price assigned by each firm in the fairness-agnostic and fairness-aware cases respectively, with the resulting vertical gaps between the orange and blue lines illustrating important discrepancies in purchase probabilities between consumer groups under price allocations associated to both behaviours.
Figure 3: Left: Tax actions taken by the RL social planner. Reported policy mechanisms record SP actions averaged over 20 seeds. Right: Social welfare trajectories during evaluation for the SP's learned policy frameworks from multi-armed bandit, contextual bandit, and RL formulations.
Figure 4: FairReplayBuffer vs. FIFO for the RL setting
Figure 5: FairReplayBuffer vs. FIFO for the C-MAB setting
...and 1 more figures

Fairness Incentives in Response to Unfair Dynamic Pricing

TL;DR

Abstract

Fairness Incentives in Response to Unfair Dynamic Pricing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)