Table of Contents
Fetching ...

Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation

Yun-Wei Chu, Dong-Jun Han, Christopher G. Brinton

TL;DR

The paper tackles the practical problem of high communication costs in federated multilingual NMT. It proposes MetaSend, a Model-Agnostic Meta-Learning (MAML) based system that produces a dynamic per-round transmission threshold $\theta^r$ to decide which NMT tensor updates to send, using per-tensor deviation $dev$ and two sending modes $MetaSend_g$ and $MetaSend_l$. The approach yields significant improvements in translation quality (as measured by SacreBLEU and COMET) and reductions in transmitted tensor volume under fixed budgets, across MTNT and UNMT datasets, including both IID and non-IID client distributions. The work demonstrates the practical potential of bandwidth-aware, privacy-preserving FL for multilingual NMT and establishes a foundation for further integration with pruning and other efficiency techniques.

Abstract

Federated learning (FL) is a promising distributed machine learning paradigm that enables multiple clients to collaboratively train a global model. In this paper, we focus on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints in practical network systems present challenges for exchanging large-scale NMT engines between FL parties. We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions from clients during FL-based multilingual NMT training. Our approach learns a dynamic threshold for filtering parameters prior to transmission without compromising the NMT model quality, based on the tensor deviations of clients between different FL rounds. Through experiments on two NMT datasets with different language distributions, we demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.

Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation

TL;DR

The paper tackles the practical problem of high communication costs in federated multilingual NMT. It proposes MetaSend, a Model-Agnostic Meta-Learning (MAML) based system that produces a dynamic per-round transmission threshold to decide which NMT tensor updates to send, using per-tensor deviation and two sending modes and . The approach yields significant improvements in translation quality (as measured by SacreBLEU and COMET) and reductions in transmitted tensor volume under fixed budgets, across MTNT and UNMT datasets, including both IID and non-IID client distributions. The work demonstrates the practical potential of bandwidth-aware, privacy-preserving FL for multilingual NMT and establishes a foundation for further integration with pruning and other efficiency techniques.

Abstract

Federated learning (FL) is a promising distributed machine learning paradigm that enables multiple clients to collaboratively train a global model. In this paper, we focus on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints in practical network systems present challenges for exchanging large-scale NMT engines between FL parties. We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions from clients during FL-based multilingual NMT training. Our approach learns a dynamic threshold for filtering parameters prior to transmission without compromising the NMT model quality, based on the tensor deviations of clients between different FL rounds. Through experiments on two NMT datasets with different language distributions, we demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.
Paper Structure (30 sections, 5 equations, 11 figures, 12 tables, 2 algorithms)

This paper contains 30 sections, 5 equations, 11 figures, 12 tables, 2 algorithms.

Figures (11)

  • Figure 1: Sample histograms of the difference (absolute-value norms) between tensors of NMT engines computed for clients across consecutive communication rounds in FL training. The traditional method (red thresholds) fails to accurately capture the boundary between clusters during sending, while our MetaSend (blue thresholds) provides a dynamic threshold that adapts to the varying distribution across FL rounds.
  • Figure 2: Overview of MetaSend for federated NMT. MetaSend allows clients to adaptively select key NMT model parameters using a learned threshold per round, sending only a subset of tensors to the server for aggregation and improving efficiency under a limited communication budget.
  • Figure 3: Optimization of our MAML module in an FL setup involves adapting the sending threshold based on NMT model quality. The threshold is initially applied to every client, resulting in selected parameters for each client. Subsequently, clients send the resulting parameters to the server for aggregating a global model. After aggregating the client models, we evaluate the global model using validation sets from each client. Finally, the MAML module performs a meta-update based on the evaluated loss.
  • Figure 4: The generations (Ru $\to$ Zh) of ${\text{MetaSend}_{l}}$ under IID and Non-IID FL training. Similar to most FL methods, models under Non-IID FL training need more training rounds to reach stability. The generations before convergence consists of different languages for the early model in Non-IID FL training. (blue: Zh, red: Es, green: Fr)
  • Figure 5: (a) Translation examples (Ru $\to$ Zh) of ${\text{DP}_{l}}$, ${\text{MetaSend}_{l}}$, and ground truth. Our method aligns better with ground truth, and ${\text{DP}_{l}}$ generates redundant tokens. (b) Translation examples (Ru $\to$ Zh) of ${\text{DP}_{l}}$, ${\text{MetaSend}_{l}}$, and ground truth. Our method generates the same sentiment-meaning word as ground truth, while ${\text{DP}_{l}}$ generates similar but different sentiment-meaning words.
  • ...and 6 more figures