Table of Contents
Fetching ...

FedPartWhole: Federated domain generalization via consistent part-whole hierarchies

Ahmed Radwan, Mohamed S. Shehata

TL;DR

This work tackles Federated Domain Generalization under data-privacy constraints by proposing CCNet, a lightweight backbone that explicitly models part–whole hierarchies via a scene parse tree. Each spatial patch is processed by a cortical-column-like unit that builds multi-level latent representations, initialized from SAM-based masks and a downsized MaxVIT encoder, and refined through BU/TD/I/Attention modules. Empirically, CCNet-based backbones yield strong generalization across PACS and VLCS benchmarks with fewer parameters than CNN baselines, outperforming them by over 12 percentage points in some settings and offering natural interpretability through islands of agreement. The approach is architecture-centric and orthogonal to existing FedDG strategies, enabling improved generalization without centralized data or heavy pretraining, and suggesting a path toward more explainable federated models. Overall, the paper demonstrates that encoding compositional scene structure in the backbone can substantially enhance cross-domain generalization in privacy-preserving federated learning settings.

Abstract

Federated Domain Generalization (FedDG), aims to tackle the challenge of generalizing to unseen domains at test time while catering to the data privacy constraints that prevent centralized data storage from different domains originating at various clients. Existing approaches can be broadly categorized into four groups: domain alignment, data manipulation, learning strategies, and optimization of model aggregation weights. This paper proposes a novel approach to Federated Domain Generalization that tackles the problem from the perspective of the backbone model architecture. The core principle is that objects, even under substantial domain shifts and appearance variations, maintain a consistent hierarchical structure of parts and wholes. For instance, a photograph and a sketch of a dog share the same hierarchical organization, consisting of a head, body, limbs, and so on. The introduced architecture explicitly incorporates a feature representation for the image parse tree. To the best of our knowledge, this is the first work to tackle Federated Domain Generalization from a model architecture standpoint. Our approach outperforms a convolutional architecture of comparable size by over 12\%, despite utilizing fewer parameters. Additionally, it is inherently interpretable, contrary to the black-box nature of CNNs, which fosters trust in its predictions, a crucial asset in federated learning.

FedPartWhole: Federated domain generalization via consistent part-whole hierarchies

TL;DR

This work tackles Federated Domain Generalization under data-privacy constraints by proposing CCNet, a lightweight backbone that explicitly models part–whole hierarchies via a scene parse tree. Each spatial patch is processed by a cortical-column-like unit that builds multi-level latent representations, initialized from SAM-based masks and a downsized MaxVIT encoder, and refined through BU/TD/I/Attention modules. Empirically, CCNet-based backbones yield strong generalization across PACS and VLCS benchmarks with fewer parameters than CNN baselines, outperforming them by over 12 percentage points in some settings and offering natural interpretability through islands of agreement. The approach is architecture-centric and orthogonal to existing FedDG strategies, enabling improved generalization without centralized data or heavy pretraining, and suggesting a path toward more explainable federated models. Overall, the paper demonstrates that encoding compositional scene structure in the backbone can substantially enhance cross-domain generalization in privacy-preserving federated learning settings.

Abstract

Federated Domain Generalization (FedDG), aims to tackle the challenge of generalizing to unseen domains at test time while catering to the data privacy constraints that prevent centralized data storage from different domains originating at various clients. Existing approaches can be broadly categorized into four groups: domain alignment, data manipulation, learning strategies, and optimization of model aggregation weights. This paper proposes a novel approach to Federated Domain Generalization that tackles the problem from the perspective of the backbone model architecture. The core principle is that objects, even under substantial domain shifts and appearance variations, maintain a consistent hierarchical structure of parts and wholes. For instance, a photograph and a sketch of a dog share the same hierarchical organization, consisting of a head, body, limbs, and so on. The introduced architecture explicitly incorporates a feature representation for the image parse tree. To the best of our knowledge, this is the first work to tackle Federated Domain Generalization from a model architecture standpoint. Our approach outperforms a convolutional architecture of comparable size by over 12\%, despite utilizing fewer parameters. Additionally, it is inherently interpretable, contrary to the black-box nature of CNNs, which fosters trust in its predictions, a crucial asset in federated learning.
Paper Structure (20 sections, 1 equation, 4 figures, 3 tables)

This paper contains 20 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The Figure illustrates the motivation behind our approach. Objects will maintain their compositional hierarchical structure despite any change in appearance, which aids domain generalization. On the left, the parse tree of a picture of a dog, while the right-hand side depicts the parse tree for a sketch drawing of a dog. The figure illustrates that both images have the same parse tree, despite the huge appearance discrepancy.
  • Figure 2: The Figure illustrates the CCNet architecture. a) the input image. b) the input tokenizer. c) the initialization pipeline. d) the tokenized input e) the hidden representation at t = 0. f) The network architecture. g) the hidden representation at t = 1. h) the softmax classification output. i) the structure of the classification head.
  • Figure 3: The Figure depicts a sample of the PACS dataset from the four different domains, illustrating the challenge of domain generalization under significant domain shift.
  • Figure 4: The Figure illustrates the islands of agreements formed at $t = 0$ in CCNet. The right most columns show higher levels in the hirer achy, with the last column being the input image. The Figure illustrates a similar hierarchical structure of images belonging to different domains.