Table of Contents
Fetching ...

Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming

Philip Sosnin, Jodie Knapp, Fraser Kennedy, Josh Collyer, Calvin Tsay

TL;DR

The paper addresses certifying gradient-based models against data-poisoning attacks during training under white-box threat models. It introduces a single MIQCP formulation that encodes adversarial data manipulation, training dynamics, and test-time evaluation to compute both worst-case poisoning attacks and exact robustness guarantees. The authors develop specialized MIQCP techniques, including reformulations, pruning heuristics, and optimization-based bound tightening, to improve tractability on small models, demonstrating the feasibility of exact certification and providing insights into attack structure and relaxation looseness. This work offers a principled, global certification framework that can guide the design of tighter and more scalable defenses for training-time robustness.

Abstract

This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer quadratic programming (MIQCP) problem. Finding the global optimum of the proposed formulation provably yields worst-case poisoning attacks, while simultaneously bounding the effectiveness of all possible attacks on the given training pipeline. Our framework encodes both the gradient-based training dynamics and model evaluation at test time, enabling the first exact certification of training-time robustness. Experimental evaluation on small models confirms that our approach delivers a complete characterization of robustness against data poisoning.

Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming

TL;DR

The paper addresses certifying gradient-based models against data-poisoning attacks during training under white-box threat models. It introduces a single MIQCP formulation that encodes adversarial data manipulation, training dynamics, and test-time evaluation to compute both worst-case poisoning attacks and exact robustness guarantees. The authors develop specialized MIQCP techniques, including reformulations, pruning heuristics, and optimization-based bound tightening, to improve tractability on small models, demonstrating the feasibility of exact certification and providing insights into attack structure and relaxation looseness. This work offers a principled, global certification framework that can guide the design of tighter and more scalable defenses for training-time robustness.

Abstract

This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer quadratic programming (MIQCP) problem. Finding the global optimum of the proposed formulation provably yields worst-case poisoning attacks, while simultaneously bounding the effectiveness of all possible attacks on the given training pipeline. Our framework encodes both the gradient-based training dynamics and model evaluation at test time, enabling the first exact certification of training-time robustness. Experimental evaluation on small models confirms that our approach delivers a complete characterization of robustness against data poisoning.
Paper Structure (9 sections, 4 equations)