Table of Contents
Fetching ...

EntProp: High Entropy Propagation for Improving Accuracy and Robustness

Shohei Enomoto

TL;DR

EntProp addresses the challenge of achieving high standard accuracy while robustly handling distribution shifts in deep networks. It introduces entropy-based domain separation by routing high-entropy clean samples through ABNs, augmented by MixUp and free adversarial training to further push samples away from the in-distribution. The approach yields higher harmonic accuracy and robustness across multiple datasets and architectures at lower training cost, with particularly strong gains on small datasets. While the method improves distributional robustness, it reveals limitations in adversarial robustness and highlights the need for task-specific domain-selection metrics.

Abstract

Deep neural networks (DNNs) struggle to generalize to out-of-distribution domains that are different from those in training despite their impressive performance. In practical applications, it is important for DNNs to have both high standard accuracy and robustness against out-of-distribution domains. One technique that achieves both of these improvements is disentangled learning with mixture distribution via auxiliary batch normalization layers (ABNs). This technique treats clean and transformed samples as different domains, allowing a DNN to learn better features from mixed domains. However, if we distinguish the domains of the samples based on entropy, we find that some transformed samples are drawn from the same domain as clean samples, and these samples are not completely different domains. To generate samples drawn from a completely different domain than clean samples, we hypothesize that transforming clean high-entropy samples to further increase the entropy generates out-of-distribution samples that are much further away from the in-distribution domain. On the basis of the hypothesis, we propose high entropy propagation~(EntProp), which feeds high-entropy samples to the network that uses ABNs. We introduce two techniques, data augmentation and free adversarial training, that increase entropy and bring the sample further away from the in-distribution domain. These techniques do not require additional training costs. Our experimental results show that EntProp achieves higher standard accuracy and robustness with a lower training cost than the baseline methods. In particular, EntProp is highly effective at training on small datasets.

EntProp: High Entropy Propagation for Improving Accuracy and Robustness

TL;DR

EntProp addresses the challenge of achieving high standard accuracy while robustly handling distribution shifts in deep networks. It introduces entropy-based domain separation by routing high-entropy clean samples through ABNs, augmented by MixUp and free adversarial training to further push samples away from the in-distribution. The approach yields higher harmonic accuracy and robustness across multiple datasets and architectures at lower training cost, with particularly strong gains on small datasets. While the method improves distributional robustness, it reveals limitations in adversarial robustness and highlights the need for task-specific domain-selection metrics.

Abstract

Deep neural networks (DNNs) struggle to generalize to out-of-distribution domains that are different from those in training despite their impressive performance. In practical applications, it is important for DNNs to have both high standard accuracy and robustness against out-of-distribution domains. One technique that achieves both of these improvements is disentangled learning with mixture distribution via auxiliary batch normalization layers (ABNs). This technique treats clean and transformed samples as different domains, allowing a DNN to learn better features from mixed domains. However, if we distinguish the domains of the samples based on entropy, we find that some transformed samples are drawn from the same domain as clean samples, and these samples are not completely different domains. To generate samples drawn from a completely different domain than clean samples, we hypothesize that transforming clean high-entropy samples to further increase the entropy generates out-of-distribution samples that are much further away from the in-distribution domain. On the basis of the hypothesis, we propose high entropy propagation~(EntProp), which feeds high-entropy samples to the network that uses ABNs. We introduce two techniques, data augmentation and free adversarial training, that increase entropy and bring the sample further away from the in-distribution domain. These techniques do not require additional training costs. Our experimental results show that EntProp achieves higher standard accuracy and robustness with a lower training cost than the baseline methods. In particular, EntProp is highly effective at training on small datasets.
Paper Structure (40 sections, 3 equations, 7 figures, 17 tables, 1 algorithm)

This paper contains 40 sections, 3 equations, 7 figures, 17 tables, 1 algorithm.

Figures (7)

  • Figure 1: Entropy per epoch when ResNet-18 is trained with MixProp mixprop (left) and AdvProp advprop (right) on the CIFAR-100 dataset. Error bars indicate one standard deviation, and lines indicate average.
  • Figure 2: Overview of baseline methods (left) and EntProp (right). The baseline methods feed clean samples to MBN and transformed samples to ABN. EntProp treats the augmented sample as in-distribution domain and feeds it to MBN. EntProp then adversarial attacks high-entropy samples and feeds it to ABN.
  • Figure 3: Average $\mathrm{H_{score}}$ and training cost over all datasets except ImageNet. We plot the relative values with the vanilla training cost as 1.
  • Figure 4: Comparison of high-entropy sample selection to random selection using ResNet-18 on the CIFAR-100 dataset. Error bars indicate one standard error, and lines indicate the average. $k=0$ is the same as vanilla training, and $k=1$ feeds all samples to the ABN-applied network.
  • Figure 5: Entropy per epoch when ResNet-18 is trained with EntProp ($k=0.2, n=1$) (left) and EntProp ($k=0.2, n=5$) (right) on the CIFAR-100 dataset. Error bars indicate one standard deviation, and lines indicate average.
  • ...and 2 more figures