Table of Contents
Fetching ...

Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy

Philip Sosnin, Matthew Wicker, Josh Collyer, Calvin Tsay

TL;DR

This work introduces Abstract Gradient Training (AGT), a unified framework to certify model robustness against training-time perturbations arising from data poisoning, unlearning, and differential privacy. By shifting the certification target from the perturbation of data to a parameter-space enclosure, AGT defines valid parameter-space domains $\Theta$ that bound all possible trained parameters under admissible perturbations and training dynamics. The framework offers two instantiations: a scalable interval-based approach (Interval Bound Propagation) and a tighter, exact but computationally heavier optimization-based method (mixed-integer programming and MIQCP) with decompositions. Through extensive experiments on data-poisoning scenarios, unlearning tasks, and privacy-sensitive prediction, AGT demonstrates meaningful trade-offs between certificate tightness and computational cost, and shows practical applicability to real-world systems such as autonomous driving and medical imaging. Overall, AGT provides provable, attack-model-aware guarantees that enhance the safety and privacy of ML models in sensitive applications, while outlining directions for tighter bounds and scalable certification.

Abstract

The impact of inference-time data perturbation (e.g., adversarial attacks) has been extensively studied in machine learning, leading to well-established certification techniques for adversarial robustness. In contrast, certifying models against training data perturbations remains a relatively under-explored area. These perturbations can arise in three critical contexts: adversarial data poisoning, where an adversary manipulates training samples to corrupt model performance; machine unlearning, which requires certifying model behavior under the removal of specific training data; and differential privacy, where guarantees must be given with respect to substituting individual data points. This work introduces Abstract Gradient Training (AGT), a unified framework for certifying robustness of a given model and training procedure to training data perturbations, including bounded perturbations, the removal of data points, and the addition of new samples. By bounding the reachable set of parameters, i.e., establishing provable parameter-space bounds, AGT provides a formal approach to analyzing the behavior of models trained via first-order optimization methods.

Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy

TL;DR

This work introduces Abstract Gradient Training (AGT), a unified framework to certify model robustness against training-time perturbations arising from data poisoning, unlearning, and differential privacy. By shifting the certification target from the perturbation of data to a parameter-space enclosure, AGT defines valid parameter-space domains that bound all possible trained parameters under admissible perturbations and training dynamics. The framework offers two instantiations: a scalable interval-based approach (Interval Bound Propagation) and a tighter, exact but computationally heavier optimization-based method (mixed-integer programming and MIQCP) with decompositions. Through extensive experiments on data-poisoning scenarios, unlearning tasks, and privacy-sensitive prediction, AGT demonstrates meaningful trade-offs between certificate tightness and computational cost, and shows practical applicability to real-world systems such as autonomous driving and medical imaging. Overall, AGT provides provable, attack-model-aware guarantees that enhance the safety and privacy of ML models in sensitive applications, while outlining directions for tighter bounds and scalable certification.

Abstract

The impact of inference-time data perturbation (e.g., adversarial attacks) has been extensively studied in machine learning, leading to well-established certification techniques for adversarial robustness. In contrast, certifying models against training data perturbations remains a relatively under-explored area. These perturbations can arise in three critical contexts: adversarial data poisoning, where an adversary manipulates training samples to corrupt model performance; machine unlearning, which requires certifying model behavior under the removal of specific training data; and differential privacy, where guarantees must be given with respect to substituting individual data points. This work introduces Abstract Gradient Training (AGT), a unified framework for certifying robustness of a given model and training procedure to training data perturbations, including bounded perturbations, the removal of data points, and the addition of new samples. By bounding the reachable set of parameters, i.e., establishing provable parameter-space bounds, AGT provides a formal approach to analyzing the behavior of models trained via first-order optimization methods.

Paper Structure

This paper contains 98 sections, 6 theorems, 70 equations, 13 figures, 4 tables.

Key Result

Theorem 2

Let $\Theta$ be a valid parameter-space domain for a given perturbation model $\mathcal{T}$. Then, for any objective function $J$, the worst-case impact of training data perturbations can be bounded by optimizing over the parameter space instead of the dataset space:

Figures (13)

  • Figure 1: Illustration of training-time certification (left) vs inference-time certification (right). The goal of training-time certification is to verify the behavior of the perturbed model for any possible training data perturbation. Unlike inference-time certification, training-time certification requires reasoning over the entire training process.
  • Figure 2: Outline of our certification framework. During training, parameter-space bounds are propagated through each iteration, admitting some over-approximation. Post-training, the final parameter-space bounds are used to certify the network against a given specification. While we illustrate certification in logit space, our framework supports certification with respect to any desired criterion.
  • Figure 3: Summary of AGT with interval domains: (1) Forward pass bounds are computed using either interval bound propagation or interval-CROWN. (2) Per-sample gradient bounds are computed using interval backpropagation. (3) Descent direction bounds are computed using sound aggregation of per-sample gradients with respect to a given perturbation model. (4) The updated parameter interval is computed using interval arithmetic.
  • Figure 4: Bounding per-sample gradients using interval bound propagation. Top: Interval propagation through the forward pass, with respect to both input and parameter intervals. Bottom: Interval propagation through the backward pass of the network. Gradients with respect to model parameters are omitted for clarity.
  • Figure 5: Sound aggregation of per-sample gradients for our three perturbation models. Each case illustrates the procedure for upper-bounding a single parameter index, with the full bounds obtained by independently repeating the process for each parameter index. The lower bound is not shown but follows analogous operations.
  • ...and 8 more figures

Theorems & Definitions (12)

  • Definition 1: $(\epsilon, \delta)$-Differential Privacy dwork2014algorithmic
  • Theorem 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Theorem 6
  • Remark 7
  • Theorem 8
  • Theorem 9
  • Definition 10: Smooth Sensitivity, nissim2007smooth
  • ...and 2 more