Table of Contents
Fetching ...

On the Trade-offs between Adversarial Robustness and Actionable Explanations

Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

TL;DR

The paper tackles the problem of whether adversarial robustness and actionable explanations can co-exist in high-stakes ML settings. It presents a theoretical and empirical study showing that increasing robustness raises the cost of recourse and lowers its validity across both linear and nonlinear models, using SCFE, C-CHVAE, and GSM as representative recourse methods. The authors derive explicit bounds on weight differences and recourse costs, and validate them on German Credit, Adult, and COMPAS datasets, demonstrating a tangible robustness–recourse trade-off with practical implications for deploying trustworthy models. The findings highlight the need for design approaches that balance robustness with the ability to provide reliable, actionable recourses to affected individuals in real-world applications.

Abstract

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations.

On the Trade-offs between Adversarial Robustness and Actionable Explanations

TL;DR

The paper tackles the problem of whether adversarial robustness and actionable explanations can co-exist in high-stakes ML settings. It presents a theoretical and empirical study showing that increasing robustness raises the cost of recourse and lowers its validity across both linear and nonlinear models, using SCFE, C-CHVAE, and GSM as representative recourse methods. The authors derive explicit bounds on weight differences and recourse costs, and validate them on German Credit, Adult, and COMPAS datasets, demonstrating a tangible robustness–recourse trade-off with practical implications for deploying trustworthy models. The findings highlight the need for design approaches that balance robustness with the ability to provide reliable, actionable recourses to affected individuals in real-world applications.

Abstract

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations.
Paper Structure (41 sections, 14 theorems, 51 equations, 16 figures)

This paper contains 41 sections, 14 theorems, 51 equations, 16 figures.

Key Result

Lemma 1

(Difference between non-robust and adversarially robust linear model weights) For an instance ${\mathbf{x}}$, let ${\mathbf{w}}_{\textup{NR}}$ and ${\mathbf{w}}_{\textup{R}}$ be weights of the non-robust and adversarially robust linear model. Then, for a normalized Lipschitz activation function $\si where $\Delta = N\eta ( y\|\mathbf{x}^{T}\|_{2} + \epsilon \sqrt{d})$, $\eta$ is the learning rate,

Figures (16)

  • Figure 1: Analyzing validity differences between recourses generated using non-robust and adversarially robust wide neural neural networks for German Credit, Adult, and COMPAS datasets. We find that the validity decreases for increasing values of $\epsilon$. Refer to Appendix \ref{['app:wide_nn']} for similar results on larger neural networks.
  • Figure 2: Analyzing cost differences between recourses generated using non-robust and adversarially robust wide neural neural networks for German Credit, Adult, and COMPAS datasets. We find that the cost difference (i.e., $\ell_{2}-$norm) between the recourses generated for non-robust and adversarially robust models increases for increasing values of $\epsilon$. Refer to Figure \ref{['fig:all-cost-non-linear-large']} for similar results on larger neural networks.
  • Figure 3: Analyzing validity differences between recourses generated using non-robust and adversarially robust logistic regression for German Credit, Adult, and COMPAS datasets. We find that the validity decreases for increasing values of $\epsilon$. Refer to Appendix \ref{['app:wide_nn']} for similar results on larger neural networks.
  • Figure 4: Analyzing cost differences between recourses generated using non-robust and adversarially robust logistic regression for German Credit, Adult, and COMPAS datasets. We find that the cost difference (i.e., $\ell_{2}-$norm) between the recourses generated for non-robust and adversarially robust models increases for increasing values of $\epsilon$.
  • Figure 5: This figure analyzes the cost and validity differences between recourses generated using non-robust and adversarially robust neural networks trained on the Adult dataset. These differences are examined as the model size increases in terms of depth (defined as the number of hidden layers) and width (defined as the number of nodes in each hidden layer in a neural network of depth=2). Our findings suggest that: i) the cost difference (i.e., $\ell_{2}-$norm) between the recourses generated for non-robust and adversarially robust models remains consistent even as the model's depth or width increases, and ii) the validity of the recourses remains consistent even as the model's depth or width increases. Here, the adversarially robust model is trained with $\epsilon = 0.3$.
  • ...and 11 more figures

Theorems & Definitions (24)

  • Lemma 1
  • Definition 1
  • Theorem 1
  • Definition 2
  • Theorem 2
  • Definition 3
  • Theorem 3
  • Theorem 4
  • Lemma 2
  • Theorem 5
  • ...and 14 more