Table of Contents
Fetching ...

Understanding Endogenous Data Drift in Adaptive Models with Recourse-Seeking Users

Bo-Yi Liu, Zhi-Xuan Liu, Kuan Lun Chen, Shih-Yu Tsai, Jie Gao, Hao-Tsung Yang

TL;DR

This work addresses endogenous data drift caused by recourse-seeking users in adaptive decision systems. It introduces a general framework for the deployment → user response → labeling → model update loop under limited resources and analyzes how recourse can push models toward higher decision standards, increasing recourse costs and reducing reliability. To mitigate these effects, the authors propose Fair-top-$k$ and Dynamic Continual Learning (DCL), both of which improve robustness and reduce recourse cost in experiments on synthetic and real data, with balanced accuracy remaining high in the long run. The findings connect to economic theories of competition and barriers to entry, and emphasize the importance of accounting for long-term feedback loops in recourse-aware systems. Overall, the work provides concrete methods to stabilize adaptive models facing strategic users and outlines directions for future theoretical and societal analyses.

Abstract

Deep learning models are widely used in decision-making and recommendation systems, where they typically rely on the assumption of a static data distribution between training and deployment. However, real-world deployment environments often violate this assumption. Users who receive negative outcomes may adapt their features to meet model criteria, i.e., recourse action. These adaptive behaviors create shifts in the data distribution and when models are retrained on this shifted data, a feedback loop emerges: user behavior influences the model, and the updated model in turn reshapes future user behavior. Despite its importance, this bidirectional interaction between users and models has received limited attention. In this work, we develop a general framework to model user strategic behaviors and their interactions with decision-making systems under resource constraints and competitive dynamics. Both the theoretical and empirical analyses show that user recourse behavior tends to push logistic and MLP models toward increasingly higher decision standards, resulting in higher recourse costs and less reliable recourse actions over time. To mitigate these challenges, we propose two methods--Fair-top-k and Dynamic Continual Learning (DCL)--which significantly reduce recourse cost and improve model robustness. Our findings draw connections to economic theories, highlighting how algorithmic decision-making can unintentionally reinforce a higher standard and generate endogenous barriers to entry.

Understanding Endogenous Data Drift in Adaptive Models with Recourse-Seeking Users

TL;DR

This work addresses endogenous data drift caused by recourse-seeking users in adaptive decision systems. It introduces a general framework for the deployment → user response → labeling → model update loop under limited resources and analyzes how recourse can push models toward higher decision standards, increasing recourse costs and reducing reliability. To mitigate these effects, the authors propose Fair-top- and Dynamic Continual Learning (DCL), both of which improve robustness and reduce recourse cost in experiments on synthetic and real data, with balanced accuracy remaining high in the long run. The findings connect to economic theories of competition and barriers to entry, and emphasize the importance of accounting for long-term feedback loops in recourse-aware systems. Overall, the work provides concrete methods to stabilize adaptive models facing strategic users and outlines directions for future theoretical and societal analyses.

Abstract

Deep learning models are widely used in decision-making and recommendation systems, where they typically rely on the assumption of a static data distribution between training and deployment. However, real-world deployment environments often violate this assumption. Users who receive negative outcomes may adapt their features to meet model criteria, i.e., recourse action. These adaptive behaviors create shifts in the data distribution and when models are retrained on this shifted data, a feedback loop emerges: user behavior influences the model, and the updated model in turn reshapes future user behavior. Despite its importance, this bidirectional interaction between users and models has received limited attention. In this work, we develop a general framework to model user strategic behaviors and their interactions with decision-making systems under resource constraints and competitive dynamics. Both the theoretical and empirical analyses show that user recourse behavior tends to push logistic and MLP models toward increasingly higher decision standards, resulting in higher recourse costs and less reliable recourse actions over time. To mitigate these challenges, we propose two methods--Fair-top-k and Dynamic Continual Learning (DCL)--which significantly reduce recourse cost and improve model robustness. Our findings draw connections to economic theories, highlighting how algorithmic decision-making can unintentionally reinforce a higher standard and generate endogenous barriers to entry.

Paper Structure

This paper contains 25 sections, 4 theorems, 28 equations, 4 figures, 3 tables.

Key Result

Theorem 1

Let $D,h$ be a dataset and the logistic model respectively at round $t$. For the responded dataset $D'$, there exists a model $h'$ that has a higher standard than $h$ and achieves higher accuracy on $D'$ if and only if there exists at least one recourse user, newly labeled as positive (i.e., class 1

Figures (4)

  • Figure 1: The experiment of model evolution with algorithmic recourse, where the model is updated with the top-$k$ labeling. These figures show the test dataset, which is sampled from the original distribution and remains unchanged across all rounds.
  • Figure 2: Higher Standard, Test Acceptance Rate, and STBA on Logistic Regression Model and MLP across three datasets.
  • Figure 3: (a) average recourse cost of Top-$k$ method across three datasets. (b) Fail to Recourse (FTR) of Top-$k$ method across three datasets.
  • Figure 4: Higher Standard, Test Acceptance Rate, and STBA on Logistic Regression Model and MLP on Credit data. The outlier of Higher Standard on MLP with Top-$k$ & CL method is -776.66, -132.24 at round 40, 41 and -360.13 at round 60.

Theorems & Definitions (8)

  • Definition 1: Resource saturation
  • Definition 2: Higher standard
  • Theorem 1: Higher Standard Provides Higher Accuracy
  • Lemma 1: Recourse action
  • proof
  • Proposition 1: Failed Recourse Actions Drive Higher Standards
  • proof
  • Proposition 2: Limiting Resource Drives Higher Standard