Table of Contents
Fetching ...

AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation

Lulan Shen, Ali Edalati, Brett Meyer, Warren Gross, James J. Clark

TL;DR

AdCorDA tackles refining a pretrained classifier by shifting emphasis from weight updates to input-space modifications, exploiting a weight–activation duality. It combines adversarial correction of misclassified training samples with a subsequent domain adaptation step (Deep CORAL) to align the corrected training distribution back to the original data, implemented in two stages. The method yields substantial accuracy gains on CIFAR-10/100, improves robustness to adversarial attacks, and provides clear benefits for post-training quantized models, often surpassing baselines while maintaining compact model sizes. This approach offers a practical, two-stage refinement that can be applied on top of existing pretrained models to boost both accuracy and resilience with modest additional computation.

Abstract

This paper describes a simple yet effective technique for refining a pretrained classifier network. The proposed AdCorDA method is based on modification of the training set and making use of the duality between network weights and layer inputs. We call this input space training. The method consists of two stages - adversarial correction followed by domain adaptation. Adversarial correction uses adversarial attacks to correct incorrect training-set classifications. The incorrectly classified samples of the training set are removed and replaced with the adversarially corrected samples to form a new training set, and then, in the second stage, domain adaptation is performed back to the original training set. Extensive experimental validations show significant accuracy boosts of over 5% on the CIFAR-100 dataset. The technique can be straightforwardly applied to refinement of weight-quantized neural networks, where experiments show substantial enhancement in performance over the baseline. The adversarial correction technique also results in enhanced robustness to adversarial attacks.

AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation

TL;DR

AdCorDA tackles refining a pretrained classifier by shifting emphasis from weight updates to input-space modifications, exploiting a weight–activation duality. It combines adversarial correction of misclassified training samples with a subsequent domain adaptation step (Deep CORAL) to align the corrected training distribution back to the original data, implemented in two stages. The method yields substantial accuracy gains on CIFAR-10/100, improves robustness to adversarial attacks, and provides clear benefits for post-training quantized models, often surpassing baselines while maintaining compact model sizes. This approach offers a practical, two-stage refinement that can be applied on top of existing pretrained models to boost both accuracy and resilience with modest additional computation.

Abstract

This paper describes a simple yet effective technique for refining a pretrained classifier network. The proposed AdCorDA method is based on modification of the training set and making use of the duality between network weights and layer inputs. We call this input space training. The method consists of two stages - adversarial correction followed by domain adaptation. Adversarial correction uses adversarial attacks to correct incorrect training-set classifications. The incorrectly classified samples of the training set are removed and replaced with the adversarially corrected samples to form a new training set, and then, in the second stage, domain adaptation is performed back to the original training set. Extensive experimental validations show significant accuracy boosts of over 5% on the CIFAR-100 dataset. The technique can be straightforwardly applied to refinement of weight-quantized neural networks, where experiments show substantial enhancement in performance over the baseline. The adversarial correction technique also results in enhanced robustness to adversarial attacks.
Paper Structure (17 sections, 4 equations, 3 figures, 4 tables)

This paper contains 17 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the proposed AdCorDA classifier refinement method. $T$ is the original training set; $T_c$ is the subset of $T$ that the pretrained network labels correctly, and $T_w$ the subset that is labeled incorrectly; $T_a$ is the set of samples that have been adversarially corrected; $T^\prime$ is the union of $T_c$ and $T_a$. The network is adapted from $T^\prime$ as the source domain back to $T$ as the target domain.
  • Figure 2: The incorrect class (max) and true class logits change for uncorrected (a,c) and corrected (b,d) samples of CIFAR-100 after applying the corrective LL (a,b) and VBI (c,d) attacks on the ResNet-34. The vertical dashed lines indicate mean values of incorrect class (max logit) and true class logits change.
  • Figure 3: Evaluation of ResNet-34 on CIFAR-10 dataset. (a) misclassified images, (b) the difference between the Grad-CAM images for the original and adversarially corrected inputs using DDN attack. This illustrates the shift in focus of the network for the two images, (c) the Grad-CAM image for the original incorrect image, (d) the Grad-CAM image for the adversarially corrected image.