Table of Contents
Fetching ...

Adversarial Robustness in Parameter-Space Classifiers

Tamir Shor, Ethan Fetaya, Chaim Baskin, Alex Bronstein

TL;DR

This work investigates adversarial robustness of parameter-space classifiers operating on Implicit Neural Representations (INRs). It formalizes INR fitting and parameter-space classification, and introduces five white-box attacks (Full PGD, Truncated Modulation Optimization, BOTTOM, ICOP, and Implicit Differentiation) plus a voxel-grid attack (BVA) to perturb INR inputs under signal-domain fidelity constraints. Empirical results on 2D (MNIST, Fashion-MNIST) and 3D (ModelNet10) data show parameter-space classifiers exhibit substantially stronger inherent robustness than traditional signal-domain classifiers, though full-PGD remains computationally challenging in the INR setting. The findings suggest INR-based downstream tasks can be more reliable under adversarial conditions, albeit with trade-offs in clean accuracy and computational requirements, motivating future work on robust training and scalability to more complex datasets.

Abstract

Implicit Neural Representations (INRs) have been recently garnering increasing interest in various research fields, mainly due to their ability to represent large, complex data in a compact and continuous manner. Past work further showed that numerous popular downstream tasks can be performed directly in the INR parameter-space. Doing so can substantially reduce the computational resources required to process the represented data in their native domain. A major difficulty in using modern machine-learning approaches, is their high susceptibility to adversarial attacks, which have been shown to greatly limit the reliability and applicability of such methods in a wide range of settings. In this work, we show that parameter-space models trained for classification are inherently robust to adversarial attacks -- without the need of any robust training. To support our claims, we develop a novel suite of adversarial attacks targeting parameter-space classifiers, and furthermore analyze practical considerations of attacking parameter-space classifiers.

Adversarial Robustness in Parameter-Space Classifiers

TL;DR

This work investigates adversarial robustness of parameter-space classifiers operating on Implicit Neural Representations (INRs). It formalizes INR fitting and parameter-space classification, and introduces five white-box attacks (Full PGD, Truncated Modulation Optimization, BOTTOM, ICOP, and Implicit Differentiation) plus a voxel-grid attack (BVA) to perturb INR inputs under signal-domain fidelity constraints. Empirical results on 2D (MNIST, Fashion-MNIST) and 3D (ModelNet10) data show parameter-space classifiers exhibit substantially stronger inherent robustness than traditional signal-domain classifiers, though full-PGD remains computationally challenging in the INR setting. The findings suggest INR-based downstream tasks can be more reliable under adversarial conditions, albeit with trade-offs in clean accuracy and computational requirements, motivating future work on robust training and scalability to more complex datasets.

Abstract

Implicit Neural Representations (INRs) have been recently garnering increasing interest in various research fields, mainly due to their ability to represent large, complex data in a compact and continuous manner. Past work further showed that numerous popular downstream tasks can be performed directly in the INR parameter-space. Doing so can substantially reduce the computational resources required to process the represented data in their native domain. A major difficulty in using modern machine-learning approaches, is their high susceptibility to adversarial attacks, which have been shown to greatly limit the reliability and applicability of such methods in a wide range of settings. In this work, we show that parameter-space models trained for classification are inherently robust to adversarial attacks -- without the need of any robust training. To support our claims, we develop a novel suite of adversarial attacks targeting parameter-space classifiers, and furthermore analyze practical considerations of attacking parameter-space classifiers.

Paper Structure

This paper contains 34 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Parameter-space classifier adversarial attack pipeline - for a single PGD iteration. Orange blocks are optimized, blue blocks remain frozen.
  • Figure 2: Robust accuracy for parameter-space (solid curves) and signal-space (dotted curves) classifiers.
  • Figure 3: Adversarial amplification across pipeline layers - showing modulation optimization's attenuation of adversarial amplification.
  • Figure 4: t-SNE projection of modulation vectors fitted for clean and adversarially-perturbed data - for parameter-space and signal-space classifiers.
  • Figure 5: Signal-Domain Amplification - through encoder and downstream classifier layers
  • ...and 4 more figures