Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
Yun-Wei Chu, Dong-Jun Han, Christopher G. Brinton
TL;DR
The paper tackles the practical problem of high communication costs in federated multilingual NMT. It proposes MetaSend, a Model-Agnostic Meta-Learning (MAML) based system that produces a dynamic per-round transmission threshold $\theta^r$ to decide which NMT tensor updates to send, using per-tensor deviation $dev$ and two sending modes $MetaSend_g$ and $MetaSend_l$. The approach yields significant improvements in translation quality (as measured by SacreBLEU and COMET) and reductions in transmitted tensor volume under fixed budgets, across MTNT and UNMT datasets, including both IID and non-IID client distributions. The work demonstrates the practical potential of bandwidth-aware, privacy-preserving FL for multilingual NMT and establishes a foundation for further integration with pruning and other efficiency techniques.
Abstract
Federated learning (FL) is a promising distributed machine learning paradigm that enables multiple clients to collaboratively train a global model. In this paper, we focus on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints in practical network systems present challenges for exchanging large-scale NMT engines between FL parties. We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions from clients during FL-based multilingual NMT training. Our approach learns a dynamic threshold for filtering parameters prior to transmission without compromising the NMT model quality, based on the tensor deviations of clients between different FL rounds. Through experiments on two NMT datasets with different language distributions, we demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.
