Table of Contents
Fetching ...

FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks

Tobias Lorenz, Marta Kwiatkowska, Mario Fritz

TL;DR

FullCert addresses the gap in training-time robustness by delivering deterministic end-to-end certificates that cover both data poisoning during training and evasion during inference. It recasts training and inference as a single composite function and uses abstract interpretation with interval bounds (via BoundFlow) to propagate bounded perturbations through an unrolled computation, producing a sound certificate whenever every possible output remains correct. The approach yields a family of models under perturbations, $\mathcal{F}_\Theta$, and certifies predictions for all perturbations within $\mathcal{S}$; this is demonstrated on controlled tasks with small budgets, illustrating feasibility despite computational challenges. The work contributes a formal problem definition for deterministic end-to-end certification, a convergence analysis, and an open-source BoundFlow toolkit for end-to-end bound-based certified training and inference. Overall, FullCert advances trustworthy AI by enabling, for the first time, deterministic guarantees against both training-time poisoning and inference-time adversarial perturbations in neural networks.

Abstract

Modern machine learning models are sensitive to the manipulation of both the training data (poisoning attacks) and inference data (adversarial examples). Recognizing this issue, the community has developed many empirical defenses against both attacks and, more recently, certification methods with provable guarantees against inference-time attacks. However, such guarantees are still largely lacking for training-time attacks. In this work, we present FullCert, the first end-to-end certifier with sound, deterministic bounds, which proves robustness against both training-time and inference-time attacks. We first bound all possible perturbations an adversary can make to the training data under the considered threat model. Using these constraints, we bound the perturbations' influence on the model's parameters. Finally, we bound the impact of these parameter changes on the model's prediction, resulting in joint robustness guarantees against poisoning and adversarial examples. To facilitate this novel certification paradigm, we combine our theoretical work with a new open-source library BoundFlow, which enables model training on bounded datasets. We experimentally demonstrate FullCert's feasibility on two datasets.

FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks

TL;DR

FullCert addresses the gap in training-time robustness by delivering deterministic end-to-end certificates that cover both data poisoning during training and evasion during inference. It recasts training and inference as a single composite function and uses abstract interpretation with interval bounds (via BoundFlow) to propagate bounded perturbations through an unrolled computation, producing a sound certificate whenever every possible output remains correct. The approach yields a family of models under perturbations, , and certifies predictions for all perturbations within ; this is demonstrated on controlled tasks with small budgets, illustrating feasibility despite computational challenges. The work contributes a formal problem definition for deterministic end-to-end certification, a convergence analysis, and an open-source BoundFlow toolkit for end-to-end bound-based certified training and inference. Overall, FullCert advances trustworthy AI by enabling, for the first time, deterministic guarantees against both training-time poisoning and inference-time adversarial perturbations in neural networks.

Abstract

Modern machine learning models are sensitive to the manipulation of both the training data (poisoning attacks) and inference data (adversarial examples). Recognizing this issue, the community has developed many empirical defenses against both attacks and, more recently, certification methods with provable guarantees against inference-time attacks. However, such guarantees are still largely lacking for training-time attacks. In this work, we present FullCert, the first end-to-end certifier with sound, deterministic bounds, which proves robustness against both training-time and inference-time attacks. We first bound all possible perturbations an adversary can make to the training data under the considered threat model. Using these constraints, we bound the perturbations' influence on the model's parameters. Finally, we bound the impact of these parameter changes on the model's prediction, resulting in joint robustness guarantees against poisoning and adversarial examples. To facilitate this novel certification paradigm, we combine our theoretical work with a new open-source library BoundFlow, which enables model training on bounded datasets. We experimentally demonstrate FullCert's feasibility on two datasets.
Paper Structure (38 sections, 34 equations, 4 figures, 4 tables)

This paper contains 38 sections, 34 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of FullCert. During training, we bound the effects that perturbation of the training data can have on the model. At inference, we combine these bounds with bounds for perturbation of the test data. The result is a certified prediction against training and test time attacks.
  • Figure 2: Barriers to the decision boundaries for all models that could have resulted from poisoning the Two-Moons dataset. Our bounds on the parameters guarantee that all points outside these barriers are robustly classified.
  • Figure 3: Certified accuracy for different initial accuracies on Two-Moons for $\epsilon = 10^{-3}$ and MNIST 1/7 for $\epsilon = 10^{-4}$. Each dot represents a separate model, with the convex hull in blue. The closer the initialization after pretraining is to the final operating point, the higher the final certified accuracy. Blue convex hulls are for visualization purposes.
  • Figure 4: Comparison between FullCert and BagFlip. Left: Certified Accuracy for different $\epsilon$ on Two-Moons. The threat model allows perturbations of up to $\epsilon$ for each feature. Right: Certified Accuracy for different $R$ as a percentage of MNIST 1/7 images with one flipped feature or label (FL1 perturbation) as reported in BagFlip.