Table of Contents
Fetching ...

Towards Non-Adversarial Algorithmic Recourse

Tobias Leemann, Martin Pawelczyk, Bardh Prenkaj, Gjergji Kasneci

TL;DR

This work defines non-adversarial algorithmic recourse as counterfactual explanations that not only flip model predictions but also align with the ground-truth label in high-stakes, human-in-the-loop decisions. It formalizes a unified optimization framework for recourse and adversarial examples, and introduces NADV-based cost weighting to emphasize discriminative features. Theoretical results show how the choice of model, distance metric, and optimization routine shape non-adversarial outcomes, with a precise solution for linear models under noisy labels. Empirically, robust and accurate models, together with targeted cost weighting and adversarial training, reduce adversarial recourse across multiple tabular datasets, suggesting practical strategies for reliable, GDPR-compliant recourse in real-world decision-making.

Abstract

The streams of research on adversarial examples and counterfactual explanations have largely been growing independently. This has led to several recent works trying to elucidate their similarities and differences. Most prominently, it has been argued that adversarial examples, as opposed to counterfactual explanations, have a unique characteristic in that they lead to a misclassification compared to the ground truth. However, the computational goals and methodologies employed in existing counterfactual explanation and adversarial example generation methods often lack alignment with this requirement. Using formal definitions of adversarial examples and counterfactual explanations, we introduce non-adversarial algorithmic recourse and outline why in high-stakes situations, it is imperative to obtain counterfactual explanations that do not exhibit adversarial characteristics. We subsequently investigate how different components in the objective functions, e.g., the machine learning model or cost function used to measure distance, determine whether the outcome can be considered an adversarial example or not. Our experiments on common datasets highlight that these design choices are often more critical in deciding whether recourse is non-adversarial than whether recourse or attack algorithms are used. Furthermore, we show that choosing a robust and accurate machine learning model results in less adversarial recourse desired in practice.

Towards Non-Adversarial Algorithmic Recourse

TL;DR

This work defines non-adversarial algorithmic recourse as counterfactual explanations that not only flip model predictions but also align with the ground-truth label in high-stakes, human-in-the-loop decisions. It formalizes a unified optimization framework for recourse and adversarial examples, and introduces NADV-based cost weighting to emphasize discriminative features. Theoretical results show how the choice of model, distance metric, and optimization routine shape non-adversarial outcomes, with a precise solution for linear models under noisy labels. Empirically, robust and accurate models, together with targeted cost weighting and adversarial training, reduce adversarial recourse across multiple tabular datasets, suggesting practical strategies for reliable, GDPR-compliant recourse in real-world decision-making.

Abstract

The streams of research on adversarial examples and counterfactual explanations have largely been growing independently. This has led to several recent works trying to elucidate their similarities and differences. Most prominently, it has been argued that adversarial examples, as opposed to counterfactual explanations, have a unique characteristic in that they lead to a misclassification compared to the ground truth. However, the computational goals and methodologies employed in existing counterfactual explanation and adversarial example generation methods often lack alignment with this requirement. Using formal definitions of adversarial examples and counterfactual explanations, we introduce non-adversarial algorithmic recourse and outline why in high-stakes situations, it is imperative to obtain counterfactual explanations that do not exhibit adversarial characteristics. We subsequently investigate how different components in the objective functions, e.g., the machine learning model or cost function used to measure distance, determine whether the outcome can be considered an adversarial example or not. Our experiments on common datasets highlight that these design choices are often more critical in deciding whether recourse is non-adversarial than whether recourse or attack algorithms are used. Furthermore, we show that choosing a robust and accurate machine learning model results in less adversarial recourse desired in practice.
Paper Structure (32 sections, 1 theorem, 16 equations, 10 figures, 2 tables)

This paper contains 32 sections, 1 theorem, 16 equations, 10 figures, 2 tables.

Key Result

theorem 1

Suppose the data-generating process in Eqn. eq:linearprocess and that for $i \notin \mathcal{F}_{\text{disc}}$, we have $\beta_i=0$, and for $i \in \mathcal{F}_{\text{disc}}$, $|\beta_i| > \alpha \in \mathbb{R}$. We can maximize the expected NADV$_p$ measure for $p\in \{1,2,\infty\}$ when using the where ${\bm{p}}_{\text{disc}}(\hat{\beta}_i)$ is a probability of the feature being discriminative

Figures (10)

  • Figure 1: Overview of the realistic decision-making scenario considered in this work. We consider the relevant case where an institution, e.g., a bank, deploys a machine learning model to support decision-making overseen by human experts that make final, case-based decisions based on the model's score (left). In such a setting, constructing recourse only based on the scoring model $f$ may lead to unreliable recourse because the experts' final $y$ decision is based on further restrictions, thereby representing a shifted decision boundary (right).
  • Figure 2: Visualizing our definitions. The space of valid recourse for a factual ${\bm{x}}$ changes crosses the classifier $f$'s estimated decision-boundary (pink). The experts combine it with their expertise and restrictions into a latent decision boundary (blue). However, some recourse might not change the true label and is therefore considered adversarial (dashed arrow). The challenge is to obtain recourse that convinces the human experts. To this end, we are interested in finding the directions that lead to non-adversarial recourse (solid arrow).
  • Figure 3: Role of discriminative features in providing non-adversarial recourse. When features can be discriminative, (i.e., class-relevant) or non-discriminative (i.e., noise features), exploiting the discriminative ones will eventually lead to non-adversarial recourse, whereas solely relying on the non-discriminative ones will result in an adversarial. Nevertheless, even when selecting the correct features, several retry steps in the recourse direction may be required to cross the true decision boundary. To align recourse with discriminative features, the gradients of the model may serve as guidance, as we expect the discriminative dimensions to exhibit a higher gradient magnitude.
  • Figure 4: Both adversarial and recourse methods can succeed in producing non-adversarial recourse for ANNs. As it might not always be possible to change the ground truth immediately, we study the share of non-adversarial recourse instances after taking a certain number of retries $r$ (a higher share is better). We experiment with three recourse methods (SCFE, DICE, AR) and three adversarial methods (C&W, PGD, DeepFool). Our results indicate that DICE and PGD usually perform best in identifying non-adversarial counterfactuals. The other adversarial methods, C&W and DeepFool, often outperform the standard recourse method SCFE regarding non-adversarial recourse. Note that recourse methods strictly optimize for the lowest costs and are therefore less robust than adversarial methods, which incur higher costs.
  • Figure 5: Cost functions can play a role in generating non-adversarial recourse. (a) "admission" dataset with ANN model, DICE results shown. (b,c): Our NADV$_2$ cost function helps in making recourse slightly less adversarial for several methods and thereby reduces the number of retries required.
  • ...and 5 more figures

Theorems & Definitions (5)

  • definition 1: Adversarial Example freiesleben2022intriguing
  • definition 2: Recourse
  • definition 3: Non-Adversarial Recourse
  • definition 4: NADV measure
  • theorem 1: Optimal feature weights for recourse in linear models