Table of Contents
Fetching ...

Robust Representation Learning with Self-Distillation for Domain Generalization

Ankur Singh, Senthilnath Jayavelu

TL;DR

This work tackles domain generalization for vision transformers by introducing Robust Representation Learning with Self-Distillation (RRLD), which combines Intermediate-Block Self-Distillation (IBSD) and Augmentation-Guided Self-Distillation (AGSD). IBSD exploits supervision from randomly sampled intermediate blocks, while AGSD enforces consistency between predictions from original and AutoAugment-augmented inputs using KL divergence, all under a joint objective $L_{total} = L_{ce} + \lambda L_i + \gamma L_a$. Across PACS, OfficeHome, and a Wafer defect dataset, RRLD consistently outperforms state-of-the-art methods, achieving average gains around 2% and yielding more domain-invariant representations as evidenced by t-SNE visualizations. Ablation studies confirm the contribution of each distillation component and the stability of the method under fixed hyperparameters. Overall, RRLD offers a simple yet effective boost to transformer-based domain generalization with practical implications for real-world deployment in diverse environments.

Abstract

Despite the recent success of deep neural networks, there remains a need for effective methods to enhance domain generalization using vision transformers. In this paper, we propose a novel domain generalization technique called Robust Representation Learning with Self-Distillation (RRLD) comprising i) intermediate-block self-distillation and ii) augmentation-guided self-distillation to improve the generalization capabilities of transformer-based models on unseen domains. This approach enables the network to learn robust and general features that are invariant to different augmentations and domain shifts while effectively mitigating overfitting to source domains. To evaluate the effectiveness of our proposed method, we perform extensive experiments on PACS and OfficeHome benchmark datasets, as well as an industrial wafer semiconductor defect dataset. The results demonstrate that RRLD achieves robust and accurate generalization performance. We observe an average accuracy improvement in the range of 1.2% to 2.3% over the state-of-the-art on the three datasets.

Robust Representation Learning with Self-Distillation for Domain Generalization

TL;DR

This work tackles domain generalization for vision transformers by introducing Robust Representation Learning with Self-Distillation (RRLD), which combines Intermediate-Block Self-Distillation (IBSD) and Augmentation-Guided Self-Distillation (AGSD). IBSD exploits supervision from randomly sampled intermediate blocks, while AGSD enforces consistency between predictions from original and AutoAugment-augmented inputs using KL divergence, all under a joint objective . Across PACS, OfficeHome, and a Wafer defect dataset, RRLD consistently outperforms state-of-the-art methods, achieving average gains around 2% and yielding more domain-invariant representations as evidenced by t-SNE visualizations. Ablation studies confirm the contribution of each distillation component and the stability of the method under fixed hyperparameters. Overall, RRLD offers a simple yet effective boost to transformer-based domain generalization with practical implications for real-world deployment in diverse environments.

Abstract

Despite the recent success of deep neural networks, there remains a need for effective methods to enhance domain generalization using vision transformers. In this paper, we propose a novel domain generalization technique called Robust Representation Learning with Self-Distillation (RRLD) comprising i) intermediate-block self-distillation and ii) augmentation-guided self-distillation to improve the generalization capabilities of transformer-based models on unseen domains. This approach enables the network to learn robust and general features that are invariant to different augmentations and domain shifts while effectively mitigating overfitting to source domains. To evaluate the effectiveness of our proposed method, we perform extensive experiments on PACS and OfficeHome benchmark datasets, as well as an industrial wafer semiconductor defect dataset. The results demonstrate that RRLD achieves robust and accurate generalization performance. We observe an average accuracy improvement in the range of 1.2% to 2.3% over the state-of-the-art on the three datasets.
Paper Structure (13 sections, 3 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 3 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: RRLD: The model processes an input image $x$, which is first transformed by AutoAugment to produce $x_a$. The image $x$ is then passed through the network to produce the output $b_n(x)$. A random intermediate block is selected from the network to obtain $b_i(x)$. Simultaneously, image $x_a$ is passed through the network generating $b_n(x_a)$, with gradient computation halted during this process (Refer Algorithm \ref{['alg:label']}). The losses are then computed between the three outputs.
  • Figure 2: Wafer Dataset Images
  • Figure 3: Noisy Wafer Dataset Images
  • Figure 4: Class-wise t-SNE plots obtained from ERM-SDViT (left) and RRLD (right) on the PACS dataset. ERM-SDViT exhibits some overlap in the dog and horse classes (red regions), whereas RRLD achieves well separation between these classes (red regions).
  • Figure 5: Domain-wise t-SNE plots for ERM-SDViT (left) and RRLD (right) on the PACS dataset. In the t-SNE plot of ERM-SDViT, the sketch domain is separated from the other domains (highlighted in red), indicating challenges in aligning feature representations. RRLD shows better overlap among the domains, achieving a domain-invariant feature space.
  • ...and 1 more figures