Improving robustness to corruptions with multiplicative weight perturbations
Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski
TL;DR
This work introduces Data Augmentation via Multiplicative Perturbations (DAMP), a training method that perturbs neural network weights multiplicatively with random noise to simulate input corruptions and improve robustness across a wide range of distortions. It establishes a theoretical link between input perturbations and weight perturbations, and connects Adaptive Sharpness-Aware Minimization (ASAM) to adversarial multiplicative perturbations, showing similar update dynamics. Empirically, DAMP improves corruption robustness across CIFAR-10/100, TinyImageNet, and ImageNet on ResNet and Vision Transformers, including training ViT-S/16 from scratch on ImageNet with competitive results and in some cases surpassing more expensive methods like SAM/ASAM. DAMP can be combined with modern data augmentations (e.g., MixUp, RandAugment) and maintains comparable training cost to standard SGD, offering a practical, scalable approach to robustness in real-world vision systems.
Abstract
Deep neural networks (DNNs) excel on clean images but struggle with corrupted ones. Incorporating specific corruptions into the data augmentation pipeline can improve robustness to those corruptions but may harm performance on clean images and other types of distortion. In this paper, we introduce an alternative approach that improves the robustness of DNNs to a wide range of corruptions without compromising accuracy on clean images. We first demonstrate that input perturbations can be mimicked by multiplicative perturbations in the weight space. Leveraging this, we propose Data Augmentation via Multiplicative Perturbation (DAMP), a training method that optimizes DNNs under random multiplicative weight perturbations. We also examine the recently proposed Adaptive Sharpness-Aware Minimization (ASAM) and show that it optimizes DNNs under adversarial multiplicative weight perturbations. Experiments on image classification datasets (CIFAR-10/100, TinyImageNet and ImageNet) and neural network architectures (ResNet50, ViT-S/16, ViT-B/16) show that DAMP enhances model generalization performance in the presence of corruptions across different settings. Notably, DAMP is able to train a ViT-S/16 on ImageNet from scratch, reaching the top-1 error of 23.7% which is comparable to ResNet50 without extensive data augmentations.
