Table of Contents
Fetching ...

The Effectiveness of Random Forgetting for Robust Generalization

Vijaya Raghavan T Ramkumar, Bahram Zonooz, Elahe Arani

TL;DR

This work introduces a novel learning paradigm called FOMO, which alternates between the forgetting phase, which randomly forgets a subset of weights and regulates the model's information through weight reinitialization, and the relearning phase, which emphasizes learning generalizable features.

Abstract

Deep neural networks are susceptible to adversarial attacks, which can compromise their performance and accuracy. Adversarial Training (AT) has emerged as a popular approach for protecting neural networks against such attacks. However, a key challenge of AT is robust overfitting, where the network's robust performance on test data deteriorates with further training, thus hindering generalization. Motivated by the concept of active forgetting in the brain, we introduce a novel learning paradigm called "Forget to Mitigate Overfitting (FOMO)". FOMO alternates between the forgetting phase, which randomly forgets a subset of weights and regulates the model's information through weight reinitialization, and the relearning phase, which emphasizes learning generalizable features. Our experiments on benchmark datasets and adversarial attacks show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy while improving the state-of-the-art robustness. Furthermore, FOMO provides a better trade-off between standard and robust accuracy, outperforming baseline adversarial methods. Finally, our framework is robust to AutoAttacks and increases generalization in many real-world scenarios.

The Effectiveness of Random Forgetting for Robust Generalization

TL;DR

This work introduces a novel learning paradigm called FOMO, which alternates between the forgetting phase, which randomly forgets a subset of weights and regulates the model's information through weight reinitialization, and the relearning phase, which emphasizes learning generalizable features.

Abstract

Deep neural networks are susceptible to adversarial attacks, which can compromise their performance and accuracy. Adversarial Training (AT) has emerged as a popular approach for protecting neural networks against such attacks. However, a key challenge of AT is robust overfitting, where the network's robust performance on test data deteriorates with further training, thus hindering generalization. Motivated by the concept of active forgetting in the brain, we introduce a novel learning paradigm called "Forget to Mitigate Overfitting (FOMO)". FOMO alternates between the forgetting phase, which randomly forgets a subset of weights and regulates the model's information through weight reinitialization, and the relearning phase, which emphasizes learning generalizable features. Our experiments on benchmark datasets and adversarial attacks show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy while improving the state-of-the-art robustness. Furthermore, FOMO provides a better trade-off between standard and robust accuracy, outperforming baseline adversarial methods. Finally, our framework is robust to AutoAttacks and increases generalization in many real-world scenarios.
Paper Structure (26 sections, 4 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 26 sections, 4 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Schematics of the proposed FOMO framework illustrating its three pivotal phases during the AT. Beginning with standard learning, FOMO sequentially incorporates consolidation, a unique forgetting phase, and a relearning stage. This cyclic process enhances the robustness of the model by addressing adversarial overfitting through active forgetting and relearning.
  • Figure 2: (Left) Robustness to adversarial attacks; (right) Robustness to Natural corruptions. In both robustness analyses, FOMO shows a significant performance improvement compared to the baselines considered.
  • Figure 3: Robustness of the model perturbed by varying degrees of Gaussian noise. Our method is considerably robust to Gaussian perturbations, as the decline in performance is gradual, suggesting convergence to flatter minima.
  • Figure 4: The impact of forgetting 50% of parameters in each layer on robust generalization using PreAct-ResNet-18 on CIFAR-10 is illustrated in the figures. The figure on the left shows the impact on train robust accuracy and the figure on the right shows the impact on test robust accuracy. It is evident from the figures that forgetting in the later layers regularizes the train robust accuracy and mitigates robust overfitting when compared to forgetting in the early layers.
  • Figure 5: Study the symbiotic relationship between forgetting and relearning during adversarial training.