Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition
Yunbing Jia, Xiaoyu Kong, Fan Tang, Yixing Gao, Weiming Dong, Yi Yang
TL;DR
This work tackles the paradox that data augmentation, particularly multi-sample-based augmentation (MSA), boosts closed-set accuracy while harming open-set recognition (OSR). It introduces an asymmetric distillation framework in which the teacher also processes raw data and is guided by a cross mutual information objective and a smoothed two-hot relabeling scheme to emphasize class-specific features, thereby mitigating OSR deterioration. Through extensive experiments on OSR, semantic shift, and large-scale benchmarks, the approach yields consistent AUROC gains (often 2–4% on Tiny-ImageNet and competitive results on ImageNet-21K) while preserving or improving closed-set accuracy, and demonstrates robustness on lightweight architectures and other tasks like OoD detection. Overall, the method provides a practical win-win strategy to leverage MSA for improved performance across both closed-set and open-set scenarios, with broad applicability and solid theoretical grounding in MI-based feature discrimination.
Abstract
In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via imitation, the mixed feature with ambiguous semantics hinders the distillation. To this end, we propose an asymmetric distillation framework by feeding teacher model extra raw data to enlarge the benefit of teacher. Moreover, a joint mutual information loss and a selective relabel strategy are utilized to alleviate the influence of hard mixed samples. Our method successfully mitigates the decline in open-set and outperforms SOTAs by 2%~3% AUROC on the Tiny-ImageNet dataset and experiments on large-scale dataset ImageNet-21K demonstrate the generalization of our method.
