Robust Representation Learning with Self-Distillation for Domain Generalization
Ankur Singh, Senthilnath Jayavelu
TL;DR
This work tackles domain generalization for vision transformers by introducing Robust Representation Learning with Self-Distillation (RRLD), which combines Intermediate-Block Self-Distillation (IBSD) and Augmentation-Guided Self-Distillation (AGSD). IBSD exploits supervision from randomly sampled intermediate blocks, while AGSD enforces consistency between predictions from original and AutoAugment-augmented inputs using KL divergence, all under a joint objective $L_{total} = L_{ce} + \lambda L_i + \gamma L_a$. Across PACS, OfficeHome, and a Wafer defect dataset, RRLD consistently outperforms state-of-the-art methods, achieving average gains around 2% and yielding more domain-invariant representations as evidenced by t-SNE visualizations. Ablation studies confirm the contribution of each distillation component and the stability of the method under fixed hyperparameters. Overall, RRLD offers a simple yet effective boost to transformer-based domain generalization with practical implications for real-world deployment in diverse environments.
Abstract
Despite the recent success of deep neural networks, there remains a need for effective methods to enhance domain generalization using vision transformers. In this paper, we propose a novel domain generalization technique called Robust Representation Learning with Self-Distillation (RRLD) comprising i) intermediate-block self-distillation and ii) augmentation-guided self-distillation to improve the generalization capabilities of transformer-based models on unseen domains. This approach enables the network to learn robust and general features that are invariant to different augmentations and domain shifts while effectively mitigating overfitting to source domains. To evaluate the effectiveness of our proposed method, we perform extensive experiments on PACS and OfficeHome benchmark datasets, as well as an industrial wafer semiconductor defect dataset. The results demonstrate that RRLD achieves robust and accurate generalization performance. We observe an average accuracy improvement in the range of 1.2% to 2.3% over the state-of-the-art on the three datasets.
