Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

Yunbing Jia; Xiaoyu Kong; Fan Tang; Yixing Gao; Weiming Dong; Yi Yang

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

Yunbing Jia, Xiaoyu Kong, Fan Tang, Yixing Gao, Weiming Dong, Yi Yang

TL;DR

This work tackles the paradox that data augmentation, particularly multi-sample-based augmentation (MSA), boosts closed-set accuracy while harming open-set recognition (OSR). It introduces an asymmetric distillation framework in which the teacher also processes raw data and is guided by a cross mutual information objective and a smoothed two-hot relabeling scheme to emphasize class-specific features, thereby mitigating OSR deterioration. Through extensive experiments on OSR, semantic shift, and large-scale benchmarks, the approach yields consistent AUROC gains (often 2–4% on Tiny-ImageNet and competitive results on ImageNet-21K) while preserving or improving closed-set accuracy, and demonstrates robustness on lightweight architectures and other tasks like OoD detection. Overall, the method provides a practical win-win strategy to leverage MSA for improved performance across both closed-set and open-set scenarios, with broad applicability and solid theoretical grounding in MI-based feature discrimination.

Abstract

In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via imitation, the mixed feature with ambiguous semantics hinders the distillation. To this end, we propose an asymmetric distillation framework by feeding teacher model extra raw data to enlarge the benefit of teacher. Moreover, a joint mutual information loss and a selective relabel strategy are utilized to alleviate the influence of hard mixed samples. Our method successfully mitigates the decline in open-set and outperforms SOTAs by 2%~3% AUROC on the Tiny-ImageNet dataset and experiments on large-scale dataset ImageNet-21K demonstrate the generalization of our method.

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

TL;DR

Abstract

Paper Structure (27 sections, 12 equations, 5 figures, 13 tables)

This paper contains 27 sections, 12 equations, 5 figures, 13 tables.

Introduction
Reveal the Two Sides of MSA
Key Findings from DA and OSR Interplay
MSA Diminishes the Criteria of OSR
Retrieve the Discrepancy by Distillation
Method
Overview
Asymmetric Distillation Framework
Sample Verify and Two-Hot Label Smoothing
Experiments
Experimental Settings
Comparison on OSR Benchmark
Comparison on Semantic Shift Benchmark
Comparison on Large-Scale Benchmark
Comparison on the Light-weight Model
...and 12 more sections

Figures (5)

Figure 1: Illustration of the two sides of data augmentation. Despite the tremendous accuracy gain made by augmentations, multiple sample-based augmentation (MSA) tends to degrade the model's open-set performance.
Figure 2: (a) Heatmap visualization of the distances among all the class parings on MNIST dataset. '$k$' denotes the known classes and '$uk$' denotes the unknown classes. The number after the underline is the ground-truth label. (b) The comparison of $||\Phi_\theta(\text{x})||$ and $||\textbf{W}\,\Phi_\theta(\text{x})||$ under different training paradigms. (c) The teacher's top-2 error rate and over-confident predictions (higher than 95%) over 10000 mixed samples under different mixing coefficients.
Figure 3: The proposed asymmetric distillation framework. Both the student and teacher models receive mixed data as input and perform distillation on $\Phi_\theta(x)$. Besides, the teacher model additionally accepts raw data as input to enlarge its benefit on the mixed inputs. To further decrease the student's activation of the non-discriminative features, we filter the teacher's wrong predictions of the mixed samples out and assign them a revised label to optimize.
Figure 4: Visualizations of the model's feature space on MNIST dataset. The purple points denote the unknown samples and the rest colors represent the known classes. (a) Vanilla CNN model. (b) CNN model with CutMix augmentation. (c) CNN model with our training framework.
Figure 5: The uncertainty distribution visualizations for different method on Tiny-ImageNet dataset. (a) Vanilla CutMix training framework. (b) Symmetric distillation framework. (c) Our asymmetric distillation framework.

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

TL;DR

Abstract

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (5)