FedPartWhole: Federated domain generalization via consistent part-whole hierarchies
Ahmed Radwan, Mohamed S. Shehata
TL;DR
This work tackles Federated Domain Generalization under data-privacy constraints by proposing CCNet, a lightweight backbone that explicitly models part–whole hierarchies via a scene parse tree. Each spatial patch is processed by a cortical-column-like unit that builds multi-level latent representations, initialized from SAM-based masks and a downsized MaxVIT encoder, and refined through BU/TD/I/Attention modules. Empirically, CCNet-based backbones yield strong generalization across PACS and VLCS benchmarks with fewer parameters than CNN baselines, outperforming them by over 12 percentage points in some settings and offering natural interpretability through islands of agreement. The approach is architecture-centric and orthogonal to existing FedDG strategies, enabling improved generalization without centralized data or heavy pretraining, and suggesting a path toward more explainable federated models. Overall, the paper demonstrates that encoding compositional scene structure in the backbone can substantially enhance cross-domain generalization in privacy-preserving federated learning settings.
Abstract
Federated Domain Generalization (FedDG), aims to tackle the challenge of generalizing to unseen domains at test time while catering to the data privacy constraints that prevent centralized data storage from different domains originating at various clients. Existing approaches can be broadly categorized into four groups: domain alignment, data manipulation, learning strategies, and optimization of model aggregation weights. This paper proposes a novel approach to Federated Domain Generalization that tackles the problem from the perspective of the backbone model architecture. The core principle is that objects, even under substantial domain shifts and appearance variations, maintain a consistent hierarchical structure of parts and wholes. For instance, a photograph and a sketch of a dog share the same hierarchical organization, consisting of a head, body, limbs, and so on. The introduced architecture explicitly incorporates a feature representation for the image parse tree. To the best of our knowledge, this is the first work to tackle Federated Domain Generalization from a model architecture standpoint. Our approach outperforms a convolutional architecture of comparable size by over 12\%, despite utilizing fewer parameters. Additionally, it is inherently interpretable, contrary to the black-box nature of CNNs, which fosters trust in its predictions, a crucial asset in federated learning.
