Table of Contents
Fetching ...

FedSDWC: Federated Synergistic Dual-Representation Weak Causal Learning for OOD

Zhenyuan Huang, Hui Zhang, Wenzhong Tang, Haijun Yang

TL;DR

This work tackles federated learning under non-IID distributions and OOD scenarios by introducing FedSDWC, a weak causal learning framework that fuses invariant and variant features through a shallow causal link from variant to invariant factors. The method relies on ELBO-based variational learning, an intervention-based consistency loss, and a hybrid architecture with Fourier augmentation and Gaussian Mixture latent inference, all aggregated via FedAvg. The authors derive a generalization bound linking FL performance to client priors and demonstrate state-of-the-art results on CIFAR-10/100 and TinyImageNet for both OOD generalization and OOD detection, including robustness under covariate shifts and semantic shifts. Overall, FedSDWC provides a principled, scalable approach to robust FL with theoretical guarantees and practical impact for real-world privacy-preserving learning tasks.

Abstract

Amid growing demands for data privacy and advances in computational infrastructure, federated learning (FL) has emerged as a prominent distributed learning paradigm. Nevertheless, differences in data distribution (such as covariate and semantic shifts) severely affect its reliability in real-world deployments. To address this issue, we propose FedSDWC, a causal inference method that integrates both invariant and variant features. FedSDWC infers causal semantic representations by modeling the weak causal influence between invariant and variant features, effectively overcoming the limitations of existing invariant learning methods in accurately capturing invariant features and directly constructing causal representations. This approach significantly enhances FL's ability to generalize and detect OOD data. Theoretically, we derive FedSDWC's generalization error bound under specific conditions and, for the first time, establish its relationship with client prior distributions. Moreover, extensive experiments conducted on multiple benchmark datasets validate the superior performance of FedSDWC in handling covariate and semantic shifts. For example, FedSDWC outperforms FedICON, the next best baseline, by an average of 3.04% on CIFAR-10 and 8.11% on CIFAR-100.

FedSDWC: Federated Synergistic Dual-Representation Weak Causal Learning for OOD

TL;DR

This work tackles federated learning under non-IID distributions and OOD scenarios by introducing FedSDWC, a weak causal learning framework that fuses invariant and variant features through a shallow causal link from variant to invariant factors. The method relies on ELBO-based variational learning, an intervention-based consistency loss, and a hybrid architecture with Fourier augmentation and Gaussian Mixture latent inference, all aggregated via FedAvg. The authors derive a generalization bound linking FL performance to client priors and demonstrate state-of-the-art results on CIFAR-10/100 and TinyImageNet for both OOD generalization and OOD detection, including robustness under covariate shifts and semantic shifts. Overall, FedSDWC provides a principled, scalable approach to robust FL with theoretical guarantees and practical impact for real-world privacy-preserving learning tasks.

Abstract

Amid growing demands for data privacy and advances in computational infrastructure, federated learning (FL) has emerged as a prominent distributed learning paradigm. Nevertheless, differences in data distribution (such as covariate and semantic shifts) severely affect its reliability in real-world deployments. To address this issue, we propose FedSDWC, a causal inference method that integrates both invariant and variant features. FedSDWC infers causal semantic representations by modeling the weak causal influence between invariant and variant features, effectively overcoming the limitations of existing invariant learning methods in accurately capturing invariant features and directly constructing causal representations. This approach significantly enhances FL's ability to generalize and detect OOD data. Theoretically, we derive FedSDWC's generalization error bound under specific conditions and, for the first time, establish its relationship with client prior distributions. Moreover, extensive experiments conducted on multiple benchmark datasets validate the superior performance of FedSDWC in handling covariate and semantic shifts. For example, FedSDWC outperforms FedICON, the next best baseline, by an average of 3.04% on CIFAR-10 and 8.11% on CIFAR-100.

Paper Structure

This paper contains 32 sections, 37 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: Taking the task of identifying cats and dogs as an example, FL faces three major data challenges in the real world. (1) In-Distribution (ID) Data refers to the training data from participating clients. This data is often heterogeneous; for example, different clients might have images of dogs with a grass background or cats with a snow background, leading to non-identical class or feature distributions across the overall training data. (2) Covariate-Shifted OOD Data refers to data from non-participating clients or data from participating clients that was not used for training, where the feature distribution has changed. For example, the dog’s background shifts from grass to snow, and the cat’s background shifts from snow to grass. (3) Semantic-Shifted OOD Data refers to the emergence of new categories not present in the training set, such as cows and horses.
  • Figure 2: Comparison of different causal structures: (a) The observed variables $x$ and $y$ are solely influenced by the latent variable $z$, forming a simple causal structure. (b) Based on the original structure, an additional latent variable $s$ is introduced, assuming that $y$ is solely influenced by $s$, or assuming that both $z$ and $s$ are influenced by a deeper latent variable $c$. (c) FedSDWC improves on the existing models by decomposing $x$ into invariant features $x_s$ and environment-related variant features $x_z$, which are controlled by $s$ and $z$, respectively. In inferring $s$, we consider both the invariant features of $x_s$ and the weak causal influence of $z$. Finally, the $c$ is derived through $s$, and $x_s$ is used to infer $p(y|c, x_s)$.
  • Figure 3: Framework of FedSDWC.
  • Figure 4: Average Generalization Results of the Model on CIFAR-10-C (left) and CIFAR-100-C (right).
  • Figure 5: Comparison of anti-corruption performance stability across methods on CIFAR-10-C.
  • ...and 7 more figures