Table of Contents
Fetching ...

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Xinyu Sun, Wanwei Liu, Haoang Chi, Tingyu Chen, Xiaoguang Mao, Shangwen Wang, Lei Bu, Jingyi Wang, Yang Tan, Zhenyi Qi

Abstract

DNNs are susceptible to defects like backdoors, adversarial attacks, and unfairness, undermining their reliability. Existing approaches mainly involve retraining, optimization, constraint-solving, or search algorithms. However, most methods rely on gradient calculations, restricting applicability to specific activation functions (e.g., ReLU), or use search algorithms with uninterpretable localization and repair. Furthermore, they often lack generalizability across multiple properties. We propose SHARPEN, integrating interpretable fault localization with a derivative-free optimization strategy. First, SHARPEN introduces a Deep SHAP-based localization strategy quantifying each layer's and neuron's marginal contribution to erroneous outputs. Specifically, a hierarchical coarse-to-fine approach reranks layers by aggregated impact, then locates faulty neurons/filters by analyzing activation divergences between property-violating and benign states. Subsequently, SHARPEN incorporates CMA-ES to repair identified neurons. CMA-ES leverages a covariance matrix to capture variable dependencies, enabling gradient-free search and coordinated adjustments across coupled neurons. By combining interpretable localization with evolutionary optimization, SHARPEN enables derivative-free repair across architectures, being less sensitive to gradient anomalies and hyperparameters. We demonstrate SHARPEN's effectiveness on three repair tasks. Balancing property repair and accuracy preservation, it outperforms baselines in backdoor removal (+10.56%), adversarial mitigation (+5.78%), and unfairness repair (+11.82%). Notably, SHARPEN handles diverse tasks, and its modular design is plug-and-play with different derivative-free optimizers, highlighting its flexibility.

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Abstract

DNNs are susceptible to defects like backdoors, adversarial attacks, and unfairness, undermining their reliability. Existing approaches mainly involve retraining, optimization, constraint-solving, or search algorithms. However, most methods rely on gradient calculations, restricting applicability to specific activation functions (e.g., ReLU), or use search algorithms with uninterpretable localization and repair. Furthermore, they often lack generalizability across multiple properties. We propose SHARPEN, integrating interpretable fault localization with a derivative-free optimization strategy. First, SHARPEN introduces a Deep SHAP-based localization strategy quantifying each layer's and neuron's marginal contribution to erroneous outputs. Specifically, a hierarchical coarse-to-fine approach reranks layers by aggregated impact, then locates faulty neurons/filters by analyzing activation divergences between property-violating and benign states. Subsequently, SHARPEN incorporates CMA-ES to repair identified neurons. CMA-ES leverages a covariance matrix to capture variable dependencies, enabling gradient-free search and coordinated adjustments across coupled neurons. By combining interpretable localization with evolutionary optimization, SHARPEN enables derivative-free repair across architectures, being less sensitive to gradient anomalies and hyperparameters. We demonstrate SHARPEN's effectiveness on three repair tasks. Balancing property repair and accuracy preservation, it outperforms baselines in backdoor removal (+10.56%), adversarial mitigation (+5.78%), and unfairness repair (+11.82%). Notably, SHARPEN handles diverse tasks, and its modular design is plug-and-play with different derivative-free optimizers, highlighting its flexibility.

Paper Structure

This paper contains 24 sections, 7 equations, 9 figures, 7 tables, 2 algorithms.

Figures (9)

  • Figure 1: Motivation of SHARPEN.(A) Dilemma: Methods trade off between blind exploration and unscalable rigid constraints. (B) Insight: Recognizing that neural network faults often suffer from sparse critical neurons, we leverage interpretability to constrain the search space, transforming high-dimensional optimization into a local problem. (C) Goal: Achieving utility-preserved repair, satisfying the property ($V(f') \leq \tau$) while preserving accuracy ($\Delta Acc \leq \epsilon$).
  • Figure 2: The overall workflow of SHARPEN.
  • Figure 3: The workflow of Deep SHAP-based coarse-to-fine fault localization.
  • Figure 4: Examples of backdoor attacks (BadNet, Blend, and WaNet) on an ImageNet10 "tench/goldfish" image.
  • Figure 5: Layer contribution analysis for the BadNets removal case study on VGG16.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1: Unified Attack Success Rate
  • Definition 2: Unfairness via KL Divergence