Table of Contents
Fetching ...

Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement

Jianan Ma, Jingyi Wang, Qi Xuan, Zhen Wang

TL;DR

ProRepair presents a provable neural network repair framework that unifies defenses against backdoors, corruption, adversarial, and safety-violation failures by synthesizing a small proxy box to characterize the feature-space preimage and applying adaptive property refinement. By repairing the early feature extractor rather than the classifier, ProRepair preserves the preimage structure and provides formal guarantees via linear-relaxation based bound propagation, with a bounded distance measure guiding edits. The approach combines point-wise proxy-box synthesis and region-wise counterexample guided refinement, achieving up to 5×–2000× speedups over prior provable methods and solving all 36 safety-property violations in ACAS Xu while scaling to high-dimensional spaces. Empirical results across four repair tasks and six benchmarks show strong efficacy, robustness to activation functions, and notable generalization improvements, illustrating its practical potential for post-deployment safety in real-world systems. The work also provides an open-source toolkit to facilitate broader adoption and future enhancements in provable NN repair.

Abstract

It is known that deep neural networks may exhibit dangerous behaviors under various security threats (e.g., backdoor attacks, adversarial attacks and safety property violation) and there exists an ongoing arms race between attackers and defenders. In this work, we propose a complementary perspective to utilize recent progress on "neural network repair" to mitigate these security threats and repair various kinds of neural network defects (arising from different security threats) within a unified framework, offering a potential silver bullet solution to real-world scenarios. To substantially push the boundary of existing repair techniques (suffering from limitations such as lack of guarantees, limited scalability, considerable overhead, etc) in addressing more practical contexts, we propose ProRepair, a novel provable neural network repair framework driven by formal preimage synthesis and property refinement. The key intuitions are: (i) synthesizing a precise proxy box to characterize the feature space preimage, which can derive a bounded distance term sufficient to guide the subsequent repair step towards the correct outputs, and (ii) performing property refinement to enable surgical corrections and scale to more complex tasks. We evaluate ProRepair across four security threats repair tasks on six benchmarks and the results demonstrate it outperforms existing methods in effectiveness, efficiency and scalability. For point-wise repair, ProRepair corrects models while preserving performance and achieving significantly improved generalization, with a speedup of 5x to 2000x over existing provable approaches. In region-wise repair, ProRepair successfully repairs all 36 safety property violation instances (compared to 8 by the best existing method), and can handle 18x higher dimensional spaces.

Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement

TL;DR

ProRepair presents a provable neural network repair framework that unifies defenses against backdoors, corruption, adversarial, and safety-violation failures by synthesizing a small proxy box to characterize the feature-space preimage and applying adaptive property refinement. By repairing the early feature extractor rather than the classifier, ProRepair preserves the preimage structure and provides formal guarantees via linear-relaxation based bound propagation, with a bounded distance measure guiding edits. The approach combines point-wise proxy-box synthesis and region-wise counterexample guided refinement, achieving up to 5×–2000× speedups over prior provable methods and solving all 36 safety-property violations in ACAS Xu while scaling to high-dimensional spaces. Empirical results across four repair tasks and six benchmarks show strong efficacy, robustness to activation functions, and notable generalization improvements, illustrating its practical potential for post-deployment safety in real-world systems. The work also provides an open-source toolkit to facilitate broader adoption and future enhancements in provable NN repair.

Abstract

It is known that deep neural networks may exhibit dangerous behaviors under various security threats (e.g., backdoor attacks, adversarial attacks and safety property violation) and there exists an ongoing arms race between attackers and defenders. In this work, we propose a complementary perspective to utilize recent progress on "neural network repair" to mitigate these security threats and repair various kinds of neural network defects (arising from different security threats) within a unified framework, offering a potential silver bullet solution to real-world scenarios. To substantially push the boundary of existing repair techniques (suffering from limitations such as lack of guarantees, limited scalability, considerable overhead, etc) in addressing more practical contexts, we propose ProRepair, a novel provable neural network repair framework driven by formal preimage synthesis and property refinement. The key intuitions are: (i) synthesizing a precise proxy box to characterize the feature space preimage, which can derive a bounded distance term sufficient to guide the subsequent repair step towards the correct outputs, and (ii) performing property refinement to enable surgical corrections and scale to more complex tasks. We evaluate ProRepair across four security threats repair tasks on six benchmarks and the results demonstrate it outperforms existing methods in effectiveness, efficiency and scalability. For point-wise repair, ProRepair corrects models while preserving performance and achieving significantly improved generalization, with a speedup of 5x to 2000x over existing provable approaches. In region-wise repair, ProRepair successfully repairs all 36 safety property violation instances (compared to 8 by the best existing method), and can handle 18x higher dimensional spaces.

Paper Structure

This paper contains 39 sections, 2 theorems, 10 equations, 13 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Let $\phi$ be a point-wise property with $\phi^{in} = \{\bm{x}_{\phi}\}$, and $\mathcal{B}_{\phi}$ be the proxy box of radius $r$ centered at $\bm{h}_{\phi}^*$. Given a repaired model $\tilde{f}:=f_\mathrm{c} \circ \tilde{f}_{\mathrm{e}}$, the distance $\mathcal{L} := \lVert \tilde{f}_{\mathrm{e}}(\ where $p \ge 1$ and $\mathrm{s}$ is the feature space dimension. We further have that if $\mathcal{

Figures (13)

  • Figure 1: The overview of ProRepair framework.
  • Figure 2: t-SNE visualization of feature representations where each group of colored points can be approximately represented as the feature space preimage (set of all features mapping to correct outputs) for the corresponding class. (a) Original network shows well-separated clusters indicating high accuracy; (b) APRNN-repaired net exhibits merged clusters (40% accuracy drop) despite identical feature extraction; (c) Our method repairs the feature extractor while keeping the well-separated preimages unchanged (1.5% accuracy drop).
  • Figure 3: Illustration of refinement: the approximate input does not violate constraints before refinement, while a valid counterexample is captured by splitting the input space.
  • Figure 4: Results of adversarial attack repair. The x-axis represents the dimensionality of the space defined in desired properties. The scatter points on box sides represent the accuracy after each repair task. We conducted the experiments 20 times, each with randomly selected data to construct the desired properties. Values above the x-axis show the number of successful/total repairs, with red for APRNN and blue for ProRepair.
  • Figure 5: Results of backdoor repair on CIFAR-100 with different numbers of properties (x-axis).
  • ...and 8 more figures

Theorems & Definitions (4)

  • Definition 2.1: Desired Property Specification
  • Definition 2.2: Neural network preimage
  • Theorem 1
  • Theorem 2