Input Validation for Neural Networks via Runtime Local Robustness Verification
Jiangchao Liu, Liqian Chen, Antoine Mine, Ji Wang
TL;DR
This work tackles the challenge of ensuring neural network reliability under adversarial perturbations by proposing runtime input validation based on local robustness verification. It identifies two core observations: valid inputs exhibit substantially larger robustness radii than misclassified or adversarial inputs, and these radii for valid inputs often follow a normal distribution. Leveraging these insights, it introduces two validation methods—validation by threshold and validation by distribution—that operate with complete or incomplete verifiers to reject suspicious inputs at runtime and improve accuracy without attack-specific assumptions. The approach demonstrates strong empirical protection against adversarial examples, particularly for strong attacks, and shows practical runtime viability, suggesting a meaningful path toward safer deployment of neural networks in safety-critical settings.
Abstract
Local robustness verification can verify that a neural network is robust wrt. any perturbation to a specific input within a certain distance. We call this distance Robustness Radius. We observe that the robustness radii of correctly classified inputs are much larger than that of misclassified inputs which include adversarial examples, especially those from strong adversarial attacks. Another observation is that the robustness radii of correctly classified inputs often follow a normal distribution. Based on these two observations, we propose to validate inputs for neural networks via runtime local robustness verification. Experiments show that our approach can protect neural networks from adversarial examples and improve their accuracies.
