Table of Contents
Fetching ...

Effective and Robust Adversarial Training against Data and Label Corruptions

Peng-Fei Zhang, Zi Huang, Xin-Shun Xu, Guangdong Bai

TL;DR

The paper tackles training under simultaneous data perturbations and label noise without prior knowledge of the corruption type. It proposes ERAT, a framework that combines hybrid adversarial training over imaginary perturbations with a scoring-based class-rebalancing and semi-supervised learning to remove noisy labels. Across CIFAR-10/100 and Tiny-ImageNet and multiple architectures, ERAT delivers robust performance improvements and practical training efficiency, validating its effectiveness for real-world corrupted data. This work offers a deployable defense against dual corruption, with broad implications for safer and more reliable deep learning in imperfect data environments.

Abstract

Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources, which poses significant threats to model training. Despite existing efforts in developing robust models, current learning methods commonly overlook the possible co-existence of both corruptions, limiting the effectiveness and practicability of the model. In this paper, we develop an Effective and Robust Adversarial Training (ERAT) framework to simultaneously handle two types of corruption (i.e., data and label) without prior knowledge of their specifics. We propose a hybrid adversarial training surrounding multiple potential adversarial perturbations, alongside a semi-supervised learning based on class-rebalancing sample selection to enhance the resilience of the model for dual corruption. On the one hand, in the proposed adversarial training, the perturbation generation module learns multiple surrogate malicious data perturbations by taking a DNN model as the victim, while the model is trained to maintain semantic consistency between the original data and the hybrid perturbed data. It is expected to enable the model to cope with unpredictable perturbations in real-world data corruption. On the other hand, a class-rebalancing data selection strategy is designed to fairly differentiate clean labels from noisy labels. Semi-supervised learning is performed accordingly by discarding noisy labels. Extensive experiments demonstrate the superiority of the proposed ERAT framework.

Effective and Robust Adversarial Training against Data and Label Corruptions

TL;DR

The paper tackles training under simultaneous data perturbations and label noise without prior knowledge of the corruption type. It proposes ERAT, a framework that combines hybrid adversarial training over imaginary perturbations with a scoring-based class-rebalancing and semi-supervised learning to remove noisy labels. Across CIFAR-10/100 and Tiny-ImageNet and multiple architectures, ERAT delivers robust performance improvements and practical training efficiency, validating its effectiveness for real-world corrupted data. This work offers a deployable defense against dual corruption, with broad implications for safer and more reliable deep learning in imperfect data environments.

Abstract

Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources, which poses significant threats to model training. Despite existing efforts in developing robust models, current learning methods commonly overlook the possible co-existence of both corruptions, limiting the effectiveness and practicability of the model. In this paper, we develop an Effective and Robust Adversarial Training (ERAT) framework to simultaneously handle two types of corruption (i.e., data and label) without prior knowledge of their specifics. We propose a hybrid adversarial training surrounding multiple potential adversarial perturbations, alongside a semi-supervised learning based on class-rebalancing sample selection to enhance the resilience of the model for dual corruption. On the one hand, in the proposed adversarial training, the perturbation generation module learns multiple surrogate malicious data perturbations by taking a DNN model as the victim, while the model is trained to maintain semantic consistency between the original data and the hybrid perturbed data. It is expected to enable the model to cope with unpredictable perturbations in real-world data corruption. On the other hand, a class-rebalancing data selection strategy is designed to fairly differentiate clean labels from noisy labels. Semi-supervised learning is performed accordingly by discarding noisy labels. Extensive experiments demonstrate the superiority of the proposed ERAT framework.
Paper Structure (20 sections, 10 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 20 sections, 10 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: An illustration of the ERAT framework. In each epoch, the proposed method first leverages a scoring-based class-rebalancing strategy to separate the original dataset into a labeled set and an unlabeled set to include data with clean labels and noisy labels, respectively. Next, hybrid adversarial training is performed between the perturbation generation module and the classifier. The perturbation generation module uniformly samples attacking models to produce the most vicious data perturbations by enlarging the semantic gap between the perturbed data and the original data. The classifer is trained to maintain semantic consistency between original data of different augmentation and perturbation views by semi-supervised learning.
  • Figure 2: A case of non-rebalancing and balancing selection for CIFAR-10 under $60\%$ instance-level label noise song2022learning. The left and right bars in each class index represent non-rebalancing and rebalancing selection results, respectively. It can be seen that without rebalancing, the model has a strong bias towards the majority class, while ignoring other classes. The rebalancing strategy can effectively rectify this issue
  • Figure 3: Test accuracy on CIFAR-10 under different magnitudes of data corruption. When $\epsilon^{\prime}$ approaches 0, it means that there is only label noise. In this case, the proposed method still performs better than methods that are primarily designed for defending against label noise, demonstrating the applicability of the proposed method.
  • Figure 4: Test accuracy on CIFAR-10 under different magnitudes of label corruption. When $\kappa$ approaches 0, it means that there are only data perturbations. In this case, the proposed method still performs better than methods that are primarily designed for defending data noise, demonstrating the applicability of the proposed method.
  • Figure 5: Test accuracy on CIFAR-10 with different defense budgets against data corruption.
  • ...and 3 more figures