Table of Contents
Fetching ...

Strengthening the Internal Adversarial Robustness in Lifted Neural Networks

Christopher Zach

TL;DR

This work investigates strengthening internal adversarial robustness in lifted neural networks by (i) relaxing the min-max training objective to make the loss harder without changing parameters, and (ii) incorporating targeted adversarial perturbations within a generalized AROVR framework. It introduces distance-based and reweighted network-potential generalizations (d_\gamma and U_\theta^{⊙γ}) to shape adversarial regions and analyzes their effects on learning dynamics, including potential meta-learning of hyperparameters. The paper also presents targeted perturbation variants AROVR_{α,β}^g, explores theoretical properties, and reports empirical results on small datasets showing modest improvements at the cost of slower training. Overall, the approach offers a principled route to increase internal robustness in energy-based lifted architectures, with implications for interpolation-regime generalization and potential connections to biology-inspired robustness. The findings suggest that while robustness can be influenced through loss design, practical gains are sensitive to dataset size, network architecture, and optimization strategy, motivating further work on scalable, certified robustness for lifted models.

Abstract

Lifted neural networks (i.e. neural architectures explicitly optimizing over respective network potentials to determine the neural activities) can be combined with a type of adversarial training to gain robustness for internal as well as input layers, in addition to improved generalization performance. In this work we first investigate how adversarial robustness in this framework can be further strengthened by solely modifying the training loss. In a second step we fix some remaining limitations and arrive at a novel training loss for lifted neural networks, that combines targeted and untargeted adversarial perturbations.

Strengthening the Internal Adversarial Robustness in Lifted Neural Networks

TL;DR

This work investigates strengthening internal adversarial robustness in lifted neural networks by (i) relaxing the min-max training objective to make the loss harder without changing parameters, and (ii) incorporating targeted adversarial perturbations within a generalized AROVR framework. It introduces distance-based and reweighted network-potential generalizations (d_\gamma and U_\theta^{⊙γ}) to shape adversarial regions and analyzes their effects on learning dynamics, including potential meta-learning of hyperparameters. The paper also presents targeted perturbation variants AROVR_{α,β}^g, explores theoretical properties, and reports empirical results on small datasets showing modest improvements at the cost of slower training. Overall, the approach offers a principled route to increase internal robustness in energy-based lifted architectures, with implications for interpolation-regime generalization and potential connections to biology-inspired robustness. The findings suggest that while robustness can be influenced through loss design, practical gains are sensitive to dataset size, network architecture, and optimization strategy, motivating further work on scalable, certified robustness for lifted models.

Abstract

Lifted neural networks (i.e. neural architectures explicitly optimizing over respective network potentials to determine the neural activities) can be combined with a type of adversarial training to gain robustness for internal as well as input layers, in addition to improved generalization performance. In this work we first investigate how adversarial robustness in this framework can be further strengthened by solely modifying the training loss. In a second step we fix some remaining limitations and arrive at a novel training loss for lifted neural networks, that combines targeted and untargeted adversarial perturbations.

Paper Structure

This paper contains 15 sections, 5 theorems, 56 equations, 1 figure, 1 table.

Key Result

Proposition 3.1

Adversarial training based on eq:relaxed_gen_AT is harder than training via AROVR eq:relaxed_simple_AT in the following sense:

Figures (1)

  • Figure 1: Evolution of Lipschitz estimates (left) and corresponding test accuracy (right, selected using 10 000 validation samples). Top: graphs obtained using only 5000 samples from the training set. Bottom: training based on 50$\,$000 samples.

Theorems & Definitions (11)

  • Proposition 3.1
  • proof
  • Remark 3.2
  • Proposition 3.3
  • Proposition 4.1
  • Proposition 4.2
  • Remark 4.3
  • Proposition 4.4
  • proof : Proposition \ref{['prop:ROVR_AROVR']}
  • proof : Proposition \ref{['prop:AROVR_strengthening']}
  • ...and 1 more