Boosting Federated Domain Generalization: Understanding the Role of Advanced Pre-Trained Architectures
Avi Deb Raha, Apurba Adhikary, Mrityunjoy Gain, Yu Qiao, Choong Seon Hong
TL;DR
This work tackles Federated Domain Generalization (FDG) by evaluating next-generation pre-trained architectures—Vision Transformers, Swin Transformers, and ConvNeXt—coupled with large-scale pretraining datasets. It compares self-supervised and supervised pretraining in a federated setting using Office-Home and PACS, and demonstrates that SSL methods like BEiT can yield robust cross-domain representations, while ConvNeXt models pretrained on ImageNet-22K achieve state-of-the-art FDG performance (Office-Home: 84.46%, PACS: 92.55%). The study also reveals nuanced architecture-depth effects and pretraining-data effects, with smaller yet well-pretrained variants outperforming larger ResNets. These results establish new FDG benchmarks and offer practical guidance for deploying privacy-preserving, cross-domain FL with advanced architectures and diverse pretraining strategies.
Abstract
In this study, we explore the efficacy of advanced pre-trained architectures, such as Vision Transformers (ViT), ConvNeXt, and Swin Transformers in enhancing Federated Domain Generalization. These architectures capture global contextual features and model long-range dependencies, making them promising candidates for improving cross-domain generalization. We conduct a broad study with in-depth analysis and systematically evaluate different variants of these architectures, using extensive pre-training datasets such as ImageNet-1K, ImageNet-21K, JFT-300M, and ImageNet-22K. Additionally, we compare self-supervised and supervised pre-training strategies to assess their impact on FDG performance. Our findings suggest that self-supervised techniques, which focus on reconstructing masked image patches, can better capture the intrinsic structure of images, thereby outperforming their supervised counterparts. Comprehensive evaluations on the Office-Home and PACS datasets demonstrate that adopting advanced architectures pre-trained on larger datasets establishes new benchmarks, achieving average accuracies of 84.46\% and 92.55\%, respectively. Additionally, we observe that certain variants of these advanced models, despite having fewer parameters, outperform larger ResNet models. This highlights the critical role of utilizing sophisticated architectures and diverse pre-training strategies to enhance FDG performance, especially in scenarios with limited computational resources where model efficiency is crucial. Our results indicate that federated learning systems can become more adaptable and efficient by leveraging these advanced methods, offering valuable insights for future research in FDG.
