Table of Contents
Fetching ...

Closing the Generalization Gap in Parameter-efficient Federated Edge Learning

Xinnong Du, Zhonghao Lyu, Xiaowen Cao, Chunyang Wen, Shuguang Cui, Jie Xu

TL;DR

This paper tackles the generalization gap in parameter-efficient FEEL under non-IID data and resource constraints. It introduces an information-theoretic generalization bound and embeds it into a convergence analysis to guide a joint optimization of client participation, pruning, and wireless/computational resources via alternating optimization. The proposed framework yields a suboptimal yet effective solution that improves test accuracy while honoring energy and delay limits, as demonstrated on MNIST and CIFAR-10 with Dirichlet-based non-IID partitions. The work highlights the value of integrating generalization-aware analysis with system-level optimization to enhance robustness and efficiency of edge AI deployments.

Abstract

Federated edge learning (FEEL) provides a promising foundation for edge artificial intelligence (AI) by enabling collaborative model training while preserving data privacy. However, limited and heterogeneous local datasets, as well as resource-constrained deployment, severely degrade both model generalization and resource utilization, leading to a compromised learning performance. Therefore, we propose a parameter-efficient FEEL framework that jointly leverages model pruning and client selection to tackle such challenges. First, we derive an information-theoretic generalization statement that characterizes the discrepancy between training and testing function losses and embed it into the convergence analysis. It reveals that a larger local generalization statement can undermine the global convergence. Then, we formulate a generalization-aware average squared gradient norm bound minimization problem, by jointly optimizing the pruning ratios, client selection, and communication-computation resources under energy and delay constraints. Despite its non-convexity, the resulting mixed-integer problem is efficiently solved via an alternating optimization algorithm. Extensive experiments demonstrate that the proposed design achieves superior learning performance than state-of-the-art baselines, validating the effectiveness of coupling generalization-aware analysis with system-level optimization for efficient FEEL.

Closing the Generalization Gap in Parameter-efficient Federated Edge Learning

TL;DR

This paper tackles the generalization gap in parameter-efficient FEEL under non-IID data and resource constraints. It introduces an information-theoretic generalization bound and embeds it into a convergence analysis to guide a joint optimization of client participation, pruning, and wireless/computational resources via alternating optimization. The proposed framework yields a suboptimal yet effective solution that improves test accuracy while honoring energy and delay limits, as demonstrated on MNIST and CIFAR-10 with Dirichlet-based non-IID partitions. The work highlights the value of integrating generalization-aware analysis with system-level optimization to enhance robustness and efficiency of edge AI deployments.

Abstract

Federated edge learning (FEEL) provides a promising foundation for edge artificial intelligence (AI) by enabling collaborative model training while preserving data privacy. However, limited and heterogeneous local datasets, as well as resource-constrained deployment, severely degrade both model generalization and resource utilization, leading to a compromised learning performance. Therefore, we propose a parameter-efficient FEEL framework that jointly leverages model pruning and client selection to tackle such challenges. First, we derive an information-theoretic generalization statement that characterizes the discrepancy between training and testing function losses and embed it into the convergence analysis. It reveals that a larger local generalization statement can undermine the global convergence. Then, we formulate a generalization-aware average squared gradient norm bound minimization problem, by jointly optimizing the pruning ratios, client selection, and communication-computation resources under energy and delay constraints. Despite its non-convexity, the resulting mixed-integer problem is efficiently solved via an alternating optimization algorithm. Extensive experiments demonstrate that the proposed design achieves superior learning performance than state-of-the-art baselines, validating the effectiveness of coupling generalization-aware analysis with system-level optimization for efficient FEEL.

Paper Structure

This paper contains 25 sections, 3 theorems, 53 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

For any model $\boldsymbol{\omega}$, the norm of the difference between the gradients evaluated over the training set $\hat{\mathcal{D}}$ and test set $\tilde{\mathcal{D}}$ is upper bounded by where $\phi$ is defined as the generalization statement, derived as $\phi=[\frac{(\hat{D}+\tilde{D})}{\mathrm{p}^{'}(z\vert\hat{\mathcal{D}})}\cdot \nonumber\vert\frac{\sqrt{2(\mathrm{H}(\mathrm{p}(z\vert\t

Figures (8)

  • Figure 1: Illustration of the considered FEEL system over wireless communication networks with model pruning.
  • Figure 2: Hierarchical structure of the local dataset at client $n$.
  • Figure 3: Impact of data heterogeneity on sample distributions and generalization statements with different values of $\sigma$.
  • Figure 4: Classification accuracy of ResNet-110 versus Dirichlet parameter $\sigma$ with/without generalization statement under $T_0{=}3600$, and $E_0{=}7100$.
  • Figure 5: Convergence behavior in terms of the training loss over the overall training time. (a) LeNet on MNIST under $E_0=250$ J; (b) ResNet-110 on CIFAR-10 under $E_0=7100$ J.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Lemma 1
  • Proposition 1
  • Remark 1
  • Theorem 1
  • Remark 2