Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs
Haifeng Wen, Hong Xing, Osvaldo Simeone
TL;DR
This work studies a wireless implementation of meta-learning-based personalized federated learning (meta-pFL) using over-the-air computation (AirFL). It introduces Air-meta-pFL, a protocol combining sparsification, memory-based error compensation, and phase-aware scaling to send hyperparameter updates over a shared wireless MAC, and derives convergence bounds for constant and adaptive learning rates. A mutual-information-based generalization bound is established, revealing a trade-off where wireless impairments can improve generalization while potentially slowing convergence. Numerical experiments on Omniglot and image datasets validate the theory, showing how data heterogeneity and communication resources shape both convergence and generalization in a practical wireless setting.
Abstract
For modern artificial intelligence (AI) applications such as large language models (LLMs), the training paradigm has recently shifted to pre-training followed by fine-tuning. Furthermore, owing to dwindling open repositories of data and thanks to efforts to democratize access to AI models, pre-training is expected to increasingly migrate from the current centralized deployments to federated learning (FL) implementations. Meta-learning provides a general framework in which pre-training and fine-tuning can be formalized. Meta-learning-based personalized FL (meta-pFL) moves beyond basic personalization by targeting generalization to new agents and tasks. This paper studies the generalization performance of meta-pFL for a wireless setting in which the agents participating in the pre-training phase, i.e., meta-learning, are connected via a shared wireless channel to the server. Adopting over-the-air computing, we study the trade-off between generalization to new agents and tasks, on the one hand, and convergence, on the other hand. The trade-off arises from the fact that channel impairments may enhance generalization, while degrading convergence. Extensive numerical results validate the theory.
