Table of Contents
Fetching ...

Decentralized Domain Generalization with Style Sharing: Formal Model and Convergence Analysis

Shahryar Zehtabi, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton

TL;DR

This work addresses domain generalization in decentralized federated settings by introducing StyleDDG, a fully peer-to-peer DG algorithm that shares and exploits style statistics across one-hop neighbors. The authors provide a formal modeling framework for style-based DG, extending centralized methods like MixStyle and DSU to decentralized networks and deriving convergence guarantees under standard non-convex optimization assumptions. StyleDDG integrates consensus-based gradient updates with a three-stage style exploration pipeline (StyleShift, StyleExplore, MixStyle) to augment styles while keeping communication overhead minimal. Empirical results on PACS and VLCS show that StyleDDG achieves superior generalization to unseen target domains across varying graph connectivities and model sizes, validating both its theoretical and practical merits. The work advances the state of DG in distributed settings by delivering the first convergence analysis for style-based DG and a scalable, communication-efficient algorithm for fully decentralized networks.

Abstract

Much of federated learning (FL) focuses on settings where local dataset statistics remain the same between training and testing. However, this assumption often does not hold in practice due to distribution shifts, motivating the development of domain generalization (DG) approaches that leverage source domain data to train models capable of generalizing to unseen target domains. In this paper, we are motivated by two major gaps in existing work on FL and DG: (1) the lack of formal mathematical analysis of DG objectives; and (2) DG research in FL being limited to the star-topology architecture. We develop Decentralized Federated Domain Generalization with Style Sharing ($\textit{StyleDDG}$), a decentralized DG algorithm which allows devices in a peer-to-peer network to achieve DG based on sharing style information inferred from their datasets. Additionally, we provide the first systematic approach to analyzing style-based DG training in decentralized networks. We cast existing centralized DG algorithms within our framework, and employ their formalisms to model $\textit{StyleDDG}$. We then obtain analytical conditions under which convergence of $\textit{StyleDDG}$ can be guaranteed. Through experiments on popular DG datasets, we demonstrate that $\textit{StyleDDG}$ can obtain significant improvements in accuracy across target domains with minimal communication overhead compared to baseline decentralized gradient methods.

Decentralized Domain Generalization with Style Sharing: Formal Model and Convergence Analysis

TL;DR

This work addresses domain generalization in decentralized federated settings by introducing StyleDDG, a fully peer-to-peer DG algorithm that shares and exploits style statistics across one-hop neighbors. The authors provide a formal modeling framework for style-based DG, extending centralized methods like MixStyle and DSU to decentralized networks and deriving convergence guarantees under standard non-convex optimization assumptions. StyleDDG integrates consensus-based gradient updates with a three-stage style exploration pipeline (StyleShift, StyleExplore, MixStyle) to augment styles while keeping communication overhead minimal. Empirical results on PACS and VLCS show that StyleDDG achieves superior generalization to unseen target domains across varying graph connectivities and model sizes, validating both its theoretical and practical merits. The work advances the state of DG in distributed settings by delivering the first convergence analysis for style-based DG and a scalable, communication-efficient algorithm for fully decentralized networks.

Abstract

Much of federated learning (FL) focuses on settings where local dataset statistics remain the same between training and testing. However, this assumption often does not hold in practice due to distribution shifts, motivating the development of domain generalization (DG) approaches that leverage source domain data to train models capable of generalizing to unseen target domains. In this paper, we are motivated by two major gaps in existing work on FL and DG: (1) the lack of formal mathematical analysis of DG objectives; and (2) DG research in FL being limited to the star-topology architecture. We develop Decentralized Federated Domain Generalization with Style Sharing (), a decentralized DG algorithm which allows devices in a peer-to-peer network to achieve DG based on sharing style information inferred from their datasets. Additionally, we provide the first systematic approach to analyzing style-based DG training in decentralized networks. We cast existing centralized DG algorithms within our framework, and employ their formalisms to model . We then obtain analytical conditions under which convergence of can be guaranteed. Through experiments on popular DG datasets, we demonstrate that can obtain significant improvements in accuracy across target domains with minimal communication overhead compared to baseline decentralized gradient methods.

Paper Structure

This paper contains 30 sections, 4 theorems, 57 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Proposition 5.4

(Lipschitz continuity of style statistics) Let Assumption assump:h hold. Then the style statistics from Eq. eqn:style_statistics are Lipschitz continuous at each layer/block $\ell = \{ 1, ..., L \}$ of CNN $\theta$ for all devices $i \in \mathcal{M}$, and for all channels $c \in \{1, ..., C\}$. Spec Proof. See our technical report zehtabi2025decentralized.

Figures (3)

  • Figure 1: Proposed DDG methodology. (a) A decentralized network comprised of $m=9$ devices, with each device having data from one of three domains (painting, cartoon, photo). The performance of DDG is evaluated on the fourth domain (sketch). (b) In StyleDDG, each device first uses the style statistics of a neighbor for style shifting (getting to red dots from the black ones) for a portion its batch size. Then it concatenates the original plus the shifted points and chooses a new portion of them to do style extrapolation with and generate new styles (grey dots). The circles shown illustrate the style space at the output of a neural network layer. (c) After randomly shifting a portion of the points in the original mini-batch to a new style, and then randomly extrapolating a new portion of the outputs, we are left with a new set of styles with size equal to the original mini-batch. In StyleDDG, we use these new points (red triangles) and apply $\mathop{\mathrm{MixStyle}}\nolimits$ within them.
  • Figure 2: Illustration of style statistics. (a) A decentralized network with devices having access to data from only a single domain. (b) An overview of the ResNet18 model he2016deep as an example model at each device, where the number of channels and output size of each block are indicated. We visualize the blocks/layers that StyleDDG is applied to, which correspond to layers where we extract style statistics from. (c) Illustrates the outputs of a particular layer for all instances in the batch of size $B$. The style statistics that each device shares with its neighbors are obtained in the batch-level as shown. (d) For a given instance in the batch, the style statistics are calculated for each channel separately. (e) The $\mu$ and $\sigma$ of style statistics are computed based on the mean and standard deviation of the layer outputs.
  • Figure 3: Results on the PACS dataset for different target domains, for a network of $m=9$ clients over a random geometric graph with varying radius $r = 0.4, 0.8, \sqrt{2}$.

Theorems & Definitions (15)

  • Remark 4.1
  • Remark 5.2
  • Proposition 5.4
  • Proposition 5.5
  • Theorem 5.6
  • Lemma A.1
  • Proof B.1
  • Proof B.2
  • Proof B.3
  • Proof B.4
  • ...and 5 more