FedIA: A Plug-and-Play Importance-Aware Gradient Pruning Aggregation Method for Domain-Robust Federated Graph Learning on Node Classification
Zhanting Zhou, KaHou Tam, Zeqin Wu, Pengzhao Sun, Jinbo Wang, Fengli Zhang
TL;DR
FedIA targets domain skew in Federated Graph Learning by pre-conditioning client gradients rather than relying on aggregation alone. It introduces a two-stage, server-side gradient filtering: (i) a global top-$\rho$ masking to project updates onto a sparse, informative subspace and (ii) an Influence-Regularised Momentum weighting to down-weight outliers, all without extra communication. The method yields smoother convergence and higher final accuracy across both homogeneous Twitch-like and heterogeneous WikiNet graphs, with a convergence rate that matches standard compressed/ projected SGD bounds, $\min_{t\le T} \mathbb{E}\|\nabla f(W^t)\|^2 = O\left(\sigma^{2}/\sqrt{T}\right)$. FedIA is model-agnostic, plug-and-play, and offers privacy advantages via gradient sparsification, making it a practical baseline for robust domain-agnostic federated graph learning.
Abstract
Federated Graph Learning (FGL) under domain skew -- as observed on platforms such as \emph{Twitch Gamers} and multilingual \emph{Wikipedia} networks -- drives client models toward incompatible representations, rendering naive aggregation both unstable and ineffective. We find that the culprit is not the weighting scheme but the \emph{noisy gradient signal}: empirical analysis of baseline methods suggests that a vast majority of gradient dimensions can be dominated by domain-specific variance. We therefore shift focus from "aggregation-first" to a \emph{projection-first} strategy that denoises client updates \emph{before} they are combined. The proposed FedIA framework realises this \underline{I}mportance-\underline{A}ware idea through a two-stage, plug-and-play pipeline: (i) a server-side top-$ρ$ mask keeps only the most informative about 5% of coordinates, and (ii) a lightweight influence-regularised momentum weight suppresses outlier clients. FedIA adds \emph{no extra uplink traffic and only negligible server memory}, making it readily deployable. On both homogeneous (Twitch Gamers) and heterogeneous (Wikipedia) graphs, it yields smoother, more stable convergence and higher final accuracy than nine strong baselines. A convergence sketch further shows that dynamic projection maintains the optimal $\mathcal{O}(σ^{2}/\sqrt{T})$ rate.
