Table of Contents
Fetching ...

Sheaf HyperNetworks for Personalized Federated Learning

Bao Nguyen, Lorenzo Sani, Xinchi Qiu, Pietro Liò, Nicholas D. Lane

TL;DR

This work tackles personalized federated learning (PFL) under data heterogeneity by addressing Graph HyperNetwork (GHN) limitations such as over-smoothing and heterophily. It introduces Sheaf HyperNetworks (SHNs), which fuse cellular sheaf diffusion with hypernetworks to enable richer cross-client parameter sharing and adds a privacy-preserving method to construct client relation graphs from learned embeddings. Across multi-class classification, traffic, and weather forecasting, SHNs consistently outperform baselines, including pFedHN, Panacea, and GHN, with improvements up to $2.7\%$ in accuracy and $5.3\%$ in MSE in challenging non-IID settings. The three-stage training pipeline—Federated HyperNetwork Training, Client Relation Graph Construction, and Federated Sheaf HyperNetwork Training—together with the sheaf diffusion mechanism, provides a robust framework for scalable, expressive, and privacy-preserving PFL in diverse domains.

Abstract

Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heterophily. Moreover, we cannot apply GHNs directly to personalized federated learning (PFL) scenarios, where a priori client relation graph may be absent, private, or inaccessible. To mitigate these limitations in the context of PFL, we propose a novel class of HNs, sheaf hypernetworks (SHNs), which combine cellular sheaf theory with HNs to improve parameter sharing for PFL. We thoroughly evaluate SHNs across diverse PFL tasks, including multi-class classification, traffic and weather forecasting. Additionally, we provide a methodology for constructing client relation graphs in scenarios where such graphs are unavailable. We show that SHNs consistently outperform existing PFL solutions in complex non-IID scenarios. While the baselines' performance fluctuates depending on the task, SHNs show improvements of up to 2.7% in accuracy and 5.3% in lower mean squared error over the best-performing baseline.

Sheaf HyperNetworks for Personalized Federated Learning

TL;DR

This work tackles personalized federated learning (PFL) under data heterogeneity by addressing Graph HyperNetwork (GHN) limitations such as over-smoothing and heterophily. It introduces Sheaf HyperNetworks (SHNs), which fuse cellular sheaf diffusion with hypernetworks to enable richer cross-client parameter sharing and adds a privacy-preserving method to construct client relation graphs from learned embeddings. Across multi-class classification, traffic, and weather forecasting, SHNs consistently outperform baselines, including pFedHN, Panacea, and GHN, with improvements up to in accuracy and in MSE in challenging non-IID settings. The three-stage training pipeline—Federated HyperNetwork Training, Client Relation Graph Construction, and Federated Sheaf HyperNetwork Training—together with the sheaf diffusion mechanism, provides a robust framework for scalable, expressive, and privacy-preserving PFL in diverse domains.

Abstract

Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heterophily. Moreover, we cannot apply GHNs directly to personalized federated learning (PFL) scenarios, where a priori client relation graph may be absent, private, or inaccessible. To mitigate these limitations in the context of PFL, we propose a novel class of HNs, sheaf hypernetworks (SHNs), which combine cellular sheaf theory with HNs to improve parameter sharing for PFL. We thoroughly evaluate SHNs across diverse PFL tasks, including multi-class classification, traffic and weather forecasting. Additionally, we provide a methodology for constructing client relation graphs in scenarios where such graphs are unavailable. We show that SHNs consistently outperform existing PFL solutions in complex non-IID scenarios. While the baselines' performance fluctuates depending on the task, SHNs show improvements of up to 2.7% in accuracy and 5.3% in lower mean squared error over the best-performing baseline.
Paper Structure (33 sections, 7 equations, 12 figures, 7 tables, 3 algorithms)

This paper contains 33 sections, 7 equations, 12 figures, 7 tables, 3 algorithms.

Figures (12)

  • Figure 1: (1) The hypernetwork (HN) is trained to generate personalized parameters for each client collaborating in the personalized federated learning (PFL). (2) The learned client embeddings are extracted from the HN to construct a client relation graph. Then, cosine thresholding or $k$-nearest neighbors are used to build edges between similar embeddings. (3) The Sheaf HyperNetwork (SHN) takes on the client relation graph. The cellular sheaf projects each client node embedding into the higher-order stalk space and uses linear restriction maps, learned by the multi-layer perception (MLP) $\Phi$, to guide the diffusion process. After several iterations of sheaf diffusion, rich client embeddings are passed onto an HN to generate personalized parameters for each client.
  • Figure 2: (a) t-SNE plot of client embeddings for Cluster CIFAR100. Each marker denotes a client belonging to one of the $20$ superclasses. (b) Client relation graph constructed by cosine similarity thresholding, with a value of $0.95$ (c) Client relation graph constructed by KNN with $k=4$ (d) Client relation graph constructed by KNN with $k=1$
  • Figure 3: Cluster CIFAR100: (a), pFedHN versus SHN for $10$k communication rounds with optimal hyperparameters. (b), SHN (finetuned) initialized from a pre-trained personalized HyperNetwork (pFedHN) against SHN (from scratch) not adopting pFedHN's pre-trained parameters.
  • Figure 4: Personalized test accuracy % on CIFAR100. (a) As the number of message-passing layers increases. (b) As the cosine threshold decreases. (c) As the number of $k$-nearest neighbors increases. More details can be found in \ref{['appendix:effectiveness_of_sheaves']}.
  • Figure 5: t-SNE plot of the learnt client embeddings for Cluster CIFAR100. Each marker denotes a client belonging to one of the twenty CIFAR100 superclasses.
  • ...and 7 more figures

Theorems & Definitions (10)

  • Definition 2.1
  • Definition 2.2
  • Definition 3.1
  • Definition 3.2
  • Definition A.1
  • Definition A.2
  • Definition A.3
  • Definition A.4
  • Definition A.5
  • Definition A.6