Table of Contents
Fetching ...

Federated style aware transformer aggregation of representations

Mincheol Jeon, Euinam Huh

TL;DR

Federated learning often suffers from domain heterogeneity and data imbalance, which degrade personalization when a single global model is used. FedSTAR tackles this by explicitly disentangling content from style in client representations and by using a Transformer-based attention mechanism to adaptively aggregate class-wise content prototypes, while exchanging only compact prototypes to maintain communication efficiency. The method combines content–style decomposition with StyleFiLM personalization and attention-driven prototype fusion to produce personalized yet globally coherent representations, demonstrated across multiple benchmarks with strong non-IID performance. Empirical results show state-of-the-art accuracy and robustness, with ablations confirming that both adaptive aggregation and personalization are essential for performance gains in heterogeneous environments.

Abstract

Personalized Federated Learning (PFL) faces persistent challenges, including domain heterogeneity from diverse client data, data imbalance due to skewed participation, and strict communication constraints. Traditional federated learning often lacks personalization, as a single global model cannot capture client-specific characteristics, leading to biased predictions and poor generalization, especially for clients with highly divergent data distributions. To address these issues, we propose FedSTAR, a style-aware federated learning framework that disentangles client-specific style factors from shared content representations. FedSTAR aggregates class-wise prototypes using a Transformer-based attention mechanism, allowing the server to adaptively weight client contributions while preserving personalization. Furthermore, by exchanging compact prototypes and style vectors instead of full model parameters, FedSTAR significantly reduces communication overhead. Experimental results demonstrate that combining content-style disentanglement with attention-driven prototype aggregation improves personalization and robustness in heterogeneous environments without increasing communication cost.

Federated style aware transformer aggregation of representations

TL;DR

Federated learning often suffers from domain heterogeneity and data imbalance, which degrade personalization when a single global model is used. FedSTAR tackles this by explicitly disentangling content from style in client representations and by using a Transformer-based attention mechanism to adaptively aggregate class-wise content prototypes, while exchanging only compact prototypes to maintain communication efficiency. The method combines content–style decomposition with StyleFiLM personalization and attention-driven prototype fusion to produce personalized yet globally coherent representations, demonstrated across multiple benchmarks with strong non-IID performance. Empirical results show state-of-the-art accuracy and robustness, with ablations confirming that both adaptive aggregation and personalization are essential for performance gains in heterogeneous environments.

Abstract

Personalized Federated Learning (PFL) faces persistent challenges, including domain heterogeneity from diverse client data, data imbalance due to skewed participation, and strict communication constraints. Traditional federated learning often lacks personalization, as a single global model cannot capture client-specific characteristics, leading to biased predictions and poor generalization, especially for clients with highly divergent data distributions. To address these issues, we propose FedSTAR, a style-aware federated learning framework that disentangles client-specific style factors from shared content representations. FedSTAR aggregates class-wise prototypes using a Transformer-based attention mechanism, allowing the server to adaptively weight client contributions while preserving personalization. Furthermore, by exchanging compact prototypes and style vectors instead of full model parameters, FedSTAR significantly reduces communication overhead. Experimental results demonstrate that combining content-style disentanglement with attention-driven prototype aggregation improves personalization and robustness in heterogeneous environments without increasing communication cost.

Paper Structure

This paper contains 28 sections, 18 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Performance comparison of FedSTAR and the Ablation variant on the Fashion-MNIST dataset. Figure 1 illustrates the training accuracy and test performance over rounds. As seen, FedSTAR consistently outperforms the ablation variant in all metrics.
  • Figure 2: Performance comparison of FedSTAR and the Ablation variant on the Fashion-MNIST dataset.
  • Figure 3: UMAP embeddings of Fashion-MNIST under three federated learning methods. From left to right: FedProto, Ablation variant (attention-only), and FedSTAR. FedSTAR achieves the most compact and discriminative class clusters.