Table of Contents
Fetching ...

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

Viktor Moskvoretskii, Nazarii Tupitsa, Chris Biemann, Samuel Horváth, Eduard Gorbunov, Irina Nikishina

TL;DR

The paper addresses low-resource machine translation by introducing MeritOpt, a framework inspired by Personalized Federated Learning that learns from multiple language datasets to optimize a target translation distribution. It constructs a weighted gradient from auxiliary languages using aggregation weights $w^{t+1} \in \Delta_1^n$ and updates the model via a base optimizer $OptStep$, with $w^{t+1}$ chosen to minimize a target validation loss $f_{\hat{\bm{D}}}$ using Stochastic Mirror Descent. The authors provide theoretical convergence guarantees and demonstrate empirical gains on South East Asian and Sami Finno-Ugric translation tasks, achieving 2–10x fewer gradient steps than baselines in several settings. The approach yields interpretable insights into language interactions, shows regularization from unrelated data, and offers reproducible scripts for replication and further exploration in multilingual NLP.

Abstract

We present a new approach called MeritOpt based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the datasets of South East Asian and Finno-Ugric languages. In addition to its effectiveness, MeritOpt is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritOpt.

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

TL;DR

The paper addresses low-resource machine translation by introducing MeritOpt, a framework inspired by Personalized Federated Learning that learns from multiple language datasets to optimize a target translation distribution. It constructs a weighted gradient from auxiliary languages using aggregation weights and updates the model via a base optimizer , with chosen to minimize a target validation loss using Stochastic Mirror Descent. The authors provide theoretical convergence guarantees and demonstrate empirical gains on South East Asian and Sami Finno-Ugric translation tasks, achieving 2–10x fewer gradient steps than baselines in several settings. The approach yields interpretable insights into language interactions, shows regularization from unrelated data, and offers reproducible scripts for replication and further exploration in multilingual NLP.

Abstract

We present a new approach called MeritOpt based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the datasets of South East Asian and Finno-Ugric languages. In addition to its effectiveness, MeritOpt is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritOpt.
Paper Structure (36 sections, 2 theorems, 24 equations, 6 figures, 6 tables)

This paper contains 36 sections, 2 theorems, 24 equations, 6 figures, 6 tables.

Key Result

Theorem 1

Let Assumptions as:bounded-var, as:lipschitzness, as:boundedness hold. If Line lst:line:aux_problem is solved with error $\delta \geq 0$ (see eq:aux_approx), then MeritOpt-RMSProp with $\gamma_t = \gamma \leq \frac{\epsilon}{2L}$ and $\beta_2 \geq 1 - \frac{\epsilon^2}{16G^2}$ after $T$ iterations s

Figures (6)

  • Figure 1: Weights distribution for South East Asian languages. Target languages and data sizes are in captions.
  • Figure 2: Weights distribution across Finno-Samic languages. Target languages are mentioned in captions.
  • Figure 3: Weights distribution for target Indonesian language with unrelated Hungarian included.
  • Figure 4: Weights distribution for languages with target Indonesian on small subset.
  • Figure 5: Weights for target language (Javanese-small) with different Mirror Descent parameters.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof