Reinforcement Learning for Durable Algorithmic Recourse
Marina Ceccon, Alessandro Fabris, Goran Radanović, Asia J. Biega, Gian Antonio Susto
TL;DR
This paper addresses delivering durable algorithmic recourse in dynamic, competitive settings by framing recourse as a time-aware reinforcement learning problem. It introduces a hierarchical RL architecture with a recourse recommender $ ext{phi}$ and a predictor $ ext{mu}$ that together generate counterfactuals and target scores while ensuring validity over a horizon $T$. The approach optimizes a multi-objective reward that balances recourse reliability, feasibility, and equity (via a Gini-based metric), and it uses Soft Actor-Critic (SAC) for training in both simplified and full environments. Experimental results on synthetic and German-credit-like data show Pareto-optimal trade-offs between reliability and feasibility across various horizons and difficulty settings, outperforming baselines that do not account for endogenous population dynamics. The work provides a foundation for durable, adaptive recourse in real-world, resource-constrained decision systems, while highlighting avenues for future work such as offline training, causal modeling, and handling distribution shifts.
Abstract
Algorithmic recourse seeks to provide individuals with actionable recommendations that increase their chances of receiving favorable outcomes from automated decision systems (e.g., loan approvals). While prior research has emphasized robustness to model updates, considerably less attention has been given to the temporal dynamics of recourse--particularly in competitive, resource-constrained settings where recommendations shape future applicant pools. In this work, we present a novel time-aware framework for algorithmic recourse, explicitly modeling how candidate populations adapt in response to recommendations. Additionally, we introduce a novel reinforcement learning (RL)-based recourse algorithm that captures the evolving dynamics of the environment to generate recommendations that are both feasible and valid. We design our recommendations to be durable, supporting validity over a predefined time horizon T. This durability allows individuals to confidently reapply after taking time to implement the suggested changes. Through extensive experiments in complex simulation environments, we show that our approach substantially outperforms existing baselines, offering a superior balance between feasibility and long-term validity. Together, these results underscore the importance of incorporating temporal and behavioral dynamics into the design of practical recourse systems.
