Table of Contents
Fetching ...

FedIA: A Plug-and-Play Importance-Aware Gradient Pruning Aggregation Method for Domain-Robust Federated Graph Learning on Node Classification

Zhanting Zhou, KaHou Tam, Zeqin Wu, Pengzhao Sun, Jinbo Wang, Fengli Zhang

TL;DR

FedIA targets domain skew in Federated Graph Learning by pre-conditioning client gradients rather than relying on aggregation alone. It introduces a two-stage, server-side gradient filtering: (i) a global top-$\rho$ masking to project updates onto a sparse, informative subspace and (ii) an Influence-Regularised Momentum weighting to down-weight outliers, all without extra communication. The method yields smoother convergence and higher final accuracy across both homogeneous Twitch-like and heterogeneous WikiNet graphs, with a convergence rate that matches standard compressed/ projected SGD bounds, $\min_{t\le T} \mathbb{E}\|\nabla f(W^t)\|^2 = O\left(\sigma^{2}/\sqrt{T}\right)$. FedIA is model-agnostic, plug-and-play, and offers privacy advantages via gradient sparsification, making it a practical baseline for robust domain-agnostic federated graph learning.

Abstract

Federated Graph Learning (FGL) under domain skew -- as observed on platforms such as \emph{Twitch Gamers} and multilingual \emph{Wikipedia} networks -- drives client models toward incompatible representations, rendering naive aggregation both unstable and ineffective. We find that the culprit is not the weighting scheme but the \emph{noisy gradient signal}: empirical analysis of baseline methods suggests that a vast majority of gradient dimensions can be dominated by domain-specific variance. We therefore shift focus from "aggregation-first" to a \emph{projection-first} strategy that denoises client updates \emph{before} they are combined. The proposed FedIA framework realises this \underline{I}mportance-\underline{A}ware idea through a two-stage, plug-and-play pipeline: (i) a server-side top-$ρ$ mask keeps only the most informative about 5% of coordinates, and (ii) a lightweight influence-regularised momentum weight suppresses outlier clients. FedIA adds \emph{no extra uplink traffic and only negligible server memory}, making it readily deployable. On both homogeneous (Twitch Gamers) and heterogeneous (Wikipedia) graphs, it yields smoother, more stable convergence and higher final accuracy than nine strong baselines. A convergence sketch further shows that dynamic projection maintains the optimal $\mathcal{O}(σ^{2}/\sqrt{T})$ rate.

FedIA: A Plug-and-Play Importance-Aware Gradient Pruning Aggregation Method for Domain-Robust Federated Graph Learning on Node Classification

TL;DR

FedIA targets domain skew in Federated Graph Learning by pre-conditioning client gradients rather than relying on aggregation alone. It introduces a two-stage, server-side gradient filtering: (i) a global top- masking to project updates onto a sparse, informative subspace and (ii) an Influence-Regularised Momentum weighting to down-weight outliers, all without extra communication. The method yields smoother convergence and higher final accuracy across both homogeneous Twitch-like and heterogeneous WikiNet graphs, with a convergence rate that matches standard compressed/ projected SGD bounds, . FedIA is model-agnostic, plug-and-play, and offers privacy advantages via gradient sparsification, making it a practical baseline for robust domain-agnostic federated graph learning.

Abstract

Federated Graph Learning (FGL) under domain skew -- as observed on platforms such as \emph{Twitch Gamers} and multilingual \emph{Wikipedia} networks -- drives client models toward incompatible representations, rendering naive aggregation both unstable and ineffective. We find that the culprit is not the weighting scheme but the \emph{noisy gradient signal}: empirical analysis of baseline methods suggests that a vast majority of gradient dimensions can be dominated by domain-specific variance. We therefore shift focus from "aggregation-first" to a \emph{projection-first} strategy that denoises client updates \emph{before} they are combined. The proposed FedIA framework realises this \underline{I}mportance-\underline{A}ware idea through a two-stage, plug-and-play pipeline: (i) a server-side top- mask keeps only the most informative about 5% of coordinates, and (ii) a lightweight influence-regularised momentum weight suppresses outlier clients. FedIA adds \emph{no extra uplink traffic and only negligible server memory}, making it readily deployable. On both homogeneous (Twitch Gamers) and heterogeneous (Wikipedia) graphs, it yields smoother, more stable convergence and higher final accuracy than nine strong baselines. A convergence sketch further shows that dynamic projection maintains the optimal rate.

Paper Structure

This paper contains 47 sections, 11 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of FedIA in a domain-skewed FGL scenario. (a) Domain-skewed clients (e.g., institutions from different sources) produce gradients dominated by domain-specific noise, causing divergent global representations. (b) FedIA applies a two-stage server-side filter: a global top-$\rho$ mask preserves approximately 5% of key coordinates (denoising), followed by influence-regularised momentum weighting (outlier suppression). (c) After projection, training curves stabilize and the global model converges consistently, without extra communication or client-side state overhead.
  • Figure 2: Empirical evidence of core challenges in domain-skewed FGL and the effectiveness of our approach. (a) CKA-based visualization reveals severe representation inconsistency between local models and the global model under FedAvg. Our approach (FedAvg+IA) promotes greater uniformity, indicating enhanced domain generalization. (b) Our method achieves a superior final convergence state (lower validation loss) compared to baselines. This is accomplished by operating on a highly sparse gradient subspace, retaining only a core 5.04% of the parameters. (c) The state-of-the-art method, FGGP, demonstrates significant convergence instability across various settings, highlighting the fragility of existing aggregation schemes.
  • Figure 3: The effects of the hyperparameter variations are presented under setting 1 (cross-domain) and setting 3 (domain skew) on Twitch. The average performance of FedIA across six different methods has been calculated. The red box indicates the hyperparameter combinations that exceed the average value of the baseline without FedIA. This indicates that the superior performance of FedIA is not reliant on specific combinations of hyper-parameters but can be achieved through a wide range of general combinations.