Table of Contents
Fetching ...

Boosting Federated Domain Generalization: Understanding the Role of Advanced Pre-Trained Architectures

Avi Deb Raha, Apurba Adhikary, Mrityunjoy Gain, Yu Qiao, Choong Seon Hong

TL;DR

This work tackles Federated Domain Generalization (FDG) by evaluating next-generation pre-trained architectures—Vision Transformers, Swin Transformers, and ConvNeXt—coupled with large-scale pretraining datasets. It compares self-supervised and supervised pretraining in a federated setting using Office-Home and PACS, and demonstrates that SSL methods like BEiT can yield robust cross-domain representations, while ConvNeXt models pretrained on ImageNet-22K achieve state-of-the-art FDG performance (Office-Home: 84.46%, PACS: 92.55%). The study also reveals nuanced architecture-depth effects and pretraining-data effects, with smaller yet well-pretrained variants outperforming larger ResNets. These results establish new FDG benchmarks and offer practical guidance for deploying privacy-preserving, cross-domain FL with advanced architectures and diverse pretraining strategies.

Abstract

In this study, we explore the efficacy of advanced pre-trained architectures, such as Vision Transformers (ViT), ConvNeXt, and Swin Transformers in enhancing Federated Domain Generalization. These architectures capture global contextual features and model long-range dependencies, making them promising candidates for improving cross-domain generalization. We conduct a broad study with in-depth analysis and systematically evaluate different variants of these architectures, using extensive pre-training datasets such as ImageNet-1K, ImageNet-21K, JFT-300M, and ImageNet-22K. Additionally, we compare self-supervised and supervised pre-training strategies to assess their impact on FDG performance. Our findings suggest that self-supervised techniques, which focus on reconstructing masked image patches, can better capture the intrinsic structure of images, thereby outperforming their supervised counterparts. Comprehensive evaluations on the Office-Home and PACS datasets demonstrate that adopting advanced architectures pre-trained on larger datasets establishes new benchmarks, achieving average accuracies of 84.46\% and 92.55\%, respectively. Additionally, we observe that certain variants of these advanced models, despite having fewer parameters, outperform larger ResNet models. This highlights the critical role of utilizing sophisticated architectures and diverse pre-training strategies to enhance FDG performance, especially in scenarios with limited computational resources where model efficiency is crucial. Our results indicate that federated learning systems can become more adaptable and efficient by leveraging these advanced methods, offering valuable insights for future research in FDG.

Boosting Federated Domain Generalization: Understanding the Role of Advanced Pre-Trained Architectures

TL;DR

This work tackles Federated Domain Generalization (FDG) by evaluating next-generation pre-trained architectures—Vision Transformers, Swin Transformers, and ConvNeXt—coupled with large-scale pretraining datasets. It compares self-supervised and supervised pretraining in a federated setting using Office-Home and PACS, and demonstrates that SSL methods like BEiT can yield robust cross-domain representations, while ConvNeXt models pretrained on ImageNet-22K achieve state-of-the-art FDG performance (Office-Home: 84.46%, PACS: 92.55%). The study also reveals nuanced architecture-depth effects and pretraining-data effects, with smaller yet well-pretrained variants outperforming larger ResNets. These results establish new FDG benchmarks and offer practical guidance for deploying privacy-preserving, cross-domain FL with advanced architectures and diverse pretraining strategies.

Abstract

In this study, we explore the efficacy of advanced pre-trained architectures, such as Vision Transformers (ViT), ConvNeXt, and Swin Transformers in enhancing Federated Domain Generalization. These architectures capture global contextual features and model long-range dependencies, making them promising candidates for improving cross-domain generalization. We conduct a broad study with in-depth analysis and systematically evaluate different variants of these architectures, using extensive pre-training datasets such as ImageNet-1K, ImageNet-21K, JFT-300M, and ImageNet-22K. Additionally, we compare self-supervised and supervised pre-training strategies to assess their impact on FDG performance. Our findings suggest that self-supervised techniques, which focus on reconstructing masked image patches, can better capture the intrinsic structure of images, thereby outperforming their supervised counterparts. Comprehensive evaluations on the Office-Home and PACS datasets demonstrate that adopting advanced architectures pre-trained on larger datasets establishes new benchmarks, achieving average accuracies of 84.46\% and 92.55\%, respectively. Additionally, we observe that certain variants of these advanced models, despite having fewer parameters, outperform larger ResNet models. This highlights the critical role of utilizing sophisticated architectures and diverse pre-training strategies to enhance FDG performance, especially in scenarios with limited computational resources where model efficiency is crucial. Our results indicate that federated learning systems can become more adaptable and efficient by leveraging these advanced methods, offering valuable insights for future research in FDG.
Paper Structure (25 sections, 14 equations, 14 figures, 6 tables)

This paper contains 25 sections, 14 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Overview of the federated training process.
  • Figure 2: Federated Domain Generalization (FDG) Framework: Leveraging Pre-trained Models for Cross-Domain Adaptation. This illustration captures the core of our FDG strategy, where pre-trained SOTA models are distributed to clients for local adaptation. Each client fine-tunes the model with local data, while the central server aggregates updates to build a robust global model, enabling powerful domain generalization across diverse environments.
  • Figure 3: t-SNE visualization of feature embeddings extracted from images (a) in the Office-Home dataset and (b) in the PACS dataset using a pre-trained MobileNetV2 model. The plot represents the distribution of image features across different classes after dimensionality reduction to two components, highlighting the separability and clustering of data points corresponding to different domains.
  • Figure 4: t-SNE plots for feature representations from different models: BEiT pretraining, ConvNeXt-B pretrained on ImageNet-1K, and ConvNeXt-B pretrained on ImageNet-22K for the Office-Home dataset. The ConvNeXt-B model pretrained on ImageNet-22K demonstrates better clustering and distinctiveness of features compared to the other models, suggesting a more effective feature space adaptation, which likely contributes to its superior performance in DG tasks.
  • Figure 5: t-SNE plots for feature representations from different models: BEiT pretraining, ConvNeXt-B pretrained on ImageNet-1K, and ConvNeXt-B pretrained on ImageNet-22K for the PACS dataset. The ConvNeXt-B model pretrained on ImageNet-22K demonstrates better clustering and distinctiveness of features compared to the other models, suggesting a more effective feature space adaptation, which likely contributes to its superior performance in DG tasks.
  • ...and 9 more figures