Table of Contents
Fetching ...

Models That Are Interpretable But Not Transparent

Chudi Zhong, Panyu Chen, Cynthia Rudin

TL;DR

This work tackles the tension between providing faithful explanations for high-stakes decisions and protecting the model’s proprietary decision boundary. It introduces FaithfulDefense, a defense that generates explanations for inherently interpretable models (e.g., decision sets) by solving a maximum set cover problem to minimize disclosure while ensuring faithfulness. The authors provide greedy and exact (IP-based) algorithms, along with an augmented IP variant, and demonstrate that explanations remain faithful (FPR = 0) while substantially reducing information leakage and increasing the query budget required for effective model extraction. Empirical results on credit- and loan-related datasets show FaithfulDefense often yields slower or harder-to-use surrogates for attackers, especially on larger datasets, validating its practical potential for protecting intellectual property without sacrificing required explanations. The work highlights the trade-offs between interpretability, security, and compliance, and points to domain-specific considerations, including recourse and legal requirements.

Abstract

Faithful explanations are essential for machine learning models in high-stakes applications. Inherently interpretable models are well-suited for these applications because they naturally provide faithful explanations by revealing their decision logic. However, model designers often need to keep these models proprietary to maintain their value. This creates a tension: we need models that are interpretable--allowing human decision-makers to understand and justify predictions, but not transparent, so that the model's decision boundary is not easily replicated by attackers. Shielding the model's decision boundary is particularly challenging alongside the requirement of completely faithful explanations, since such explanations reveal the true logic of the model for an entire subspace around each query point. This work provides an approach, FaithfulDefense, that creates model explanations for logical models that are completely faithful, yet reveal as little as possible about the decision boundary. FaithfulDefense is based on a maximum set cover formulation, and we provide multiple formulations for it, taking advantage of submodularity.

Models That Are Interpretable But Not Transparent

TL;DR

This work tackles the tension between providing faithful explanations for high-stakes decisions and protecting the model’s proprietary decision boundary. It introduces FaithfulDefense, a defense that generates explanations for inherently interpretable models (e.g., decision sets) by solving a maximum set cover problem to minimize disclosure while ensuring faithfulness. The authors provide greedy and exact (IP-based) algorithms, along with an augmented IP variant, and demonstrate that explanations remain faithful (FPR = 0) while substantially reducing information leakage and increasing the query budget required for effective model extraction. Empirical results on credit- and loan-related datasets show FaithfulDefense often yields slower or harder-to-use surrogates for attackers, especially on larger datasets, validating its practical potential for protecting intellectual property without sacrificing required explanations. The work highlights the trade-offs between interpretability, security, and compliance, and points to domain-specific considerations, including recourse and legal requirements.

Abstract

Faithful explanations are essential for machine learning models in high-stakes applications. Inherently interpretable models are well-suited for these applications because they naturally provide faithful explanations by revealing their decision logic. However, model designers often need to keep these models proprietary to maintain their value. This creates a tension: we need models that are interpretable--allowing human decision-makers to understand and justify predictions, but not transparent, so that the model's decision boundary is not easily replicated by attackers. Shielding the model's decision boundary is particularly challenging alongside the requirement of completely faithful explanations, since such explanations reveal the true logic of the model for an entire subspace around each query point. This work provides an approach, FaithfulDefense, that creates model explanations for logical models that are completely faithful, yet reveal as little as possible about the decision boundary. FaithfulDefense is based on a maximum set cover formulation, and we provide multiple formulations for it, taking advantage of submodularity.

Paper Structure

This paper contains 17 sections, 1 theorem, 11 equations, 10 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Let $q$ be the query with $f(q)=1$, $e^{\textrm{base}}$ be the set of conditions used by the rule in $f$ that $q$ satisfies, and $C_{q} \subseteq C \backslash e^{\textrm{base}}$ be the additional conditions satisfied by $q$. Problem eq:prob2 of selecting a subset $e^{\textrm{add}} \subseteq C_{q}$ s

Figures (10)

  • Figure 1: Number of queries vs$.$ the proportion of positive samples covered by explanations. Lower curves are better. FaithfulDefense (red, orange, pink curves) captures fewer positive samples in test sets for all three datasets using three different querying strategies. LIME was only used for perturbation queries since attackers perturb the LIME explanations to generate the next query; LIME explanations are only applicable to the attacker's perturbation-based querying strategy. LIME explanations are incompatible with other attacker querying strategies. (max length $l=3$)
  • Figure 2: Time consumption of generating explanations when the attacker uses the perturbation strategy.(max length $l=3$).
  • Figure 3: Comparison of test performance between base and surrogate models. CART is used by the attacker to train the surrogate model. (max length $l=3$).
  • Figure 4: Number of queries vs. the proportion of positive samples covered by explanations on the training set. (max length $l=3$).
  • Figure 5: Time consumption of generating explanations. (max length $l=3$)
  • ...and 5 more figures

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1
  • proof