Table of Contents
Fetching ...

Reinforcement Learning for Durable Algorithmic Recourse

Marina Ceccon, Alessandro Fabris, Goran Radanović, Asia J. Biega, Gian Antonio Susto

TL;DR

This paper addresses delivering durable algorithmic recourse in dynamic, competitive settings by framing recourse as a time-aware reinforcement learning problem. It introduces a hierarchical RL architecture with a recourse recommender $ ext{phi}$ and a predictor $ ext{mu}$ that together generate counterfactuals and target scores while ensuring validity over a horizon $T$. The approach optimizes a multi-objective reward that balances recourse reliability, feasibility, and equity (via a Gini-based metric), and it uses Soft Actor-Critic (SAC) for training in both simplified and full environments. Experimental results on synthetic and German-credit-like data show Pareto-optimal trade-offs between reliability and feasibility across various horizons and difficulty settings, outperforming baselines that do not account for endogenous population dynamics. The work provides a foundation for durable, adaptive recourse in real-world, resource-constrained decision systems, while highlighting avenues for future work such as offline training, causal modeling, and handling distribution shifts.

Abstract

Algorithmic recourse seeks to provide individuals with actionable recommendations that increase their chances of receiving favorable outcomes from automated decision systems (e.g., loan approvals). While prior research has emphasized robustness to model updates, considerably less attention has been given to the temporal dynamics of recourse--particularly in competitive, resource-constrained settings where recommendations shape future applicant pools. In this work, we present a novel time-aware framework for algorithmic recourse, explicitly modeling how candidate populations adapt in response to recommendations. Additionally, we introduce a novel reinforcement learning (RL)-based recourse algorithm that captures the evolving dynamics of the environment to generate recommendations that are both feasible and valid. We design our recommendations to be durable, supporting validity over a predefined time horizon T. This durability allows individuals to confidently reapply after taking time to implement the suggested changes. Through extensive experiments in complex simulation environments, we show that our approach substantially outperforms existing baselines, offering a superior balance between feasibility and long-term validity. Together, these results underscore the importance of incorporating temporal and behavioral dynamics into the design of practical recourse systems.

Reinforcement Learning for Durable Algorithmic Recourse

TL;DR

This paper addresses delivering durable algorithmic recourse in dynamic, competitive settings by framing recourse as a time-aware reinforcement learning problem. It introduces a hierarchical RL architecture with a recourse recommender and a predictor that together generate counterfactuals and target scores while ensuring validity over a horizon . The approach optimizes a multi-objective reward that balances recourse reliability, feasibility, and equity (via a Gini-based metric), and it uses Soft Actor-Critic (SAC) for training in both simplified and full environments. Experimental results on synthetic and German-credit-like data show Pareto-optimal trade-offs between reliability and feasibility across various horizons and difficulty settings, outperforming baselines that do not account for endogenous population dynamics. The work provides a foundation for durable, adaptive recourse in real-world, resource-constrained decision systems, while highlighting avenues for future work such as offline training, causal modeling, and handling distribution shifts.

Abstract

Algorithmic recourse seeks to provide individuals with actionable recommendations that increase their chances of receiving favorable outcomes from automated decision systems (e.g., loan approvals). While prior research has emphasized robustness to model updates, considerably less attention has been given to the temporal dynamics of recourse--particularly in competitive, resource-constrained settings where recommendations shape future applicant pools. In this work, we present a novel time-aware framework for algorithmic recourse, explicitly modeling how candidate populations adapt in response to recommendations. Additionally, we introduce a novel reinforcement learning (RL)-based recourse algorithm that captures the evolving dynamics of the environment to generate recommendations that are both feasible and valid. We design our recommendations to be durable, supporting validity over a predefined time horizon T. This durability allows individuals to confidently reapply after taking time to implement the suggested changes. Through extensive experiments in complex simulation environments, we show that our approach substantially outperforms existing baselines, offering a superior balance between feasibility and long-term validity. Together, these results underscore the importance of incorporating temporal and behavioral dynamics into the design of practical recourse systems.

Paper Structure

This paper contains 35 sections, 22 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Recourse invalidity. At $t=1$, four candidates apply, and the two with the highest scores are accepted. The decision threshold is $0.51$; following the state-of-the-art approach, rejected candidates (yellow) receive recommendations to reach this score. At $t=2$, the rejected candidates from $t=1$ (yellow) reapply, along with two new candidates (black). The yellow candidates have implemented the recommendations and raised their scores to around $0.51$. However, because of simultaneous recourse and a new candidate with a higher score, one reapplicant is still rejected.
  • Figure 2: Comparison of Pareto fronts of our method (blue), the hybrid method based on Ustun's approach (orange), the baseline using Ustun's approach (green), and the ARR method (red), across four settings with $T \in \{1,5\}$ and $\beta \in \{0.05,0.01\}$. Pareto fronts plot Recourse Reliability $\text{RR}_t^T$ and Recourse Feasibility $\text{RF}_t^T$, each averaged over ten evaluation episodes. The gray line at $RR_t=0.8$ denotes the high reliability threshold, distinguishing configurations that achieve desirable recourse reliability.
  • Figure 3: Recourse Feasibility $\text{RF}_t^T$, for a fixed value of Recourse Reliability $\text{RR}_t^T$ ($\approx0.95$), and $\beta=0.05$, varying $T \in [1,5]$, for our method, the hybrid (based on Ustun's approach), and ARR.
  • Figure 4: Convergence curves in two identical settings ($\beta = 0.01$, $\alpha = 7$, $\tau = 5$), comparing $T \in \{1, 5\}$. The y-axis shows the average cumulative reward (smoothed over ten episodes), and the x-axis denotes the episode index.
  • Figure 5: Comparison of Pareto fronts of our method (blue line), the hybrid method based on Wachter's approach and DiCE (orange line), and the baseline method using Wachter's approach and DiCE (green dot), in a setting with $T=1$ and $\beta=0.05$. The Pareto fronts plot the Recourse Reliability $\text{RR}_t^T$ (averaged over ten evaluation episodes) against the Recourse Feasibility $\text{RF}_t^T$ (also averaged over ten evaluation episodes).