Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

Haifeng Wen; Hong Xing; Osvaldo Simeone

Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

Haifeng Wen, Hong Xing, Osvaldo Simeone

TL;DR

This work studies a wireless implementation of meta-learning-based personalized federated learning (meta-pFL) using over-the-air computation (AirFL). It introduces Air-meta-pFL, a protocol combining sparsification, memory-based error compensation, and phase-aware scaling to send hyperparameter updates over a shared wireless MAC, and derives convergence bounds for constant and adaptive learning rates. A mutual-information-based generalization bound is established, revealing a trade-off where wireless impairments can improve generalization while potentially slowing convergence. Numerical experiments on Omniglot and image datasets validate the theory, showing how data heterogeneity and communication resources shape both convergence and generalization in a practical wireless setting.

Abstract

For modern artificial intelligence (AI) applications such as large language models (LLMs), the training paradigm has recently shifted to pre-training followed by fine-tuning. Furthermore, owing to dwindling open repositories of data and thanks to efforts to democratize access to AI models, pre-training is expected to increasingly migrate from the current centralized deployments to federated learning (FL) implementations. Meta-learning provides a general framework in which pre-training and fine-tuning can be formalized. Meta-learning-based personalized FL (meta-pFL) moves beyond basic personalization by targeting generalization to new agents and tasks. This paper studies the generalization performance of meta-pFL for a wireless setting in which the agents participating in the pre-training phase, i.e., meta-learning, are connected via a shared wireless channel to the server. Adopting over-the-air computing, we study the trade-off between generalization to new agents and tasks, on the one hand, and convergence, on the other hand. The trade-off arises from the fact that channel impairments may enhance generalization, while degrading convergence. Extensive numerical results validate the theory.

Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

TL;DR

Abstract

Paper Structure (26 sections, 13 theorems, 77 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 13 theorems, 77 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Related Works
Contributions
Preliminaries and System Model
Preliminaries
Communication Model
Over-the-Air Meta-Learning Based Personalized Federated Learning
Air-meta-pFL
Convergence Analysis
Assumptions
Convergence Analysis with Constant Learning Rates
Convergence Analysis with Adaptive Learning Rates
Generalization Analysis
Meta-Generalization Error
Assumptions
...and 11 more sections

Key Result

Theorem 4.1

Under Assumptions assumption:contraction-assumption:channel fading, let $\{\hbox{\boldmath{$\theta$}}^{(t)}\}_{t=0}^{T-1}$ be the iterates generated by Air-meta-pFL (Algorithm alg:Air-meta-pFL) with constant learning rates $\alpha^{(t)} = \alpha \in (0, 1/L_G]$ and $\eta^{(t)} = \eta$ that satisfies Then, on average over the randomness of SGD, sparsification, device selection, fading, and channel

Figures (10)

Figure 1: In the considered personalized federated learning (pFL) setting, each device $i$ aims to find a fine-tuned model $\hbox{\boldmath{$\varphi$}}_i\in \mathbb{R}^{d}$ using a local data set $\mathcal{D}_i\sim \mu_i^m$ by communicating with the central server. The central server maintains a hyperparameter $\hbox{\boldmath{$\theta$}} \in \mathbb{R}^{d}$, representing a pre-trained model, which is updated based on information sent by the devices.
Figure 2: Illustration of the implementation protocol of over-the-air meta-pFL.
Figure 3: Illustration of the experimental setting for the Omniglot data set ($N=5, K=8, m_B=32, m_c=136$ or $10$).
Figure 4: Stationary convergence error, i.e., square norm of the gradient of the meta-training loss, versus global round $t$ for meta-pFL, which assumes ideal communication, and for Air-meta-pFL. The shaded error bars correspond to intervals covering 95% of the realized values, obtained from the $10$ Monte Carlo trials.
Figure 5: Stationary convergence error versus received SNR for Air-meta-pFL and meta-pFL.
...and 5 more figures

Theorems & Definitions (14)

Theorem 4.1: Convergence with Constant Learning Rate
Theorem 4.2: Convergence with Adaptive Learning Rate
Lemma 5.1: Theorem 1, jose2021information
Theorem 5.1: Generalization of Air-meta-pFL
Remark 5.1
Lemma 1: Young's inequality
Lemma 2: Lemma 4.2 in fallah2020personalized
Lemma 3: Lemma 4.3 in fallah2020personalized
Lemma 4: Lemma 4.4 in fallah2020personalized
Lemma 5
...and 4 more

Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

TL;DR

Abstract

Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (14)