Table of Contents
Fetching ...

FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models

Fatemeh, Nourzad, Amirhossein Roknilamouki, Eylem Ekici, Jia, Liu, Ness B. Shroff

TL;DR

This work introduces FIRM (Federated In-client Regularized Multi-objective alignment), a novel algorithm that achieves both client disagreement drift mitigation and communication efficiency and provides the first finite-time convergence guarantees for this federated multi-objective alignment setting.

Abstract

Aligning Large Language Models (LLMs) with human values often involves balancing multiple, conflicting objectives such as helpfulness and harmlessness. Training these models is computationally intensive, and centralizing the process raises significant data privacy concerns. Federated Learning (FL) offers a compelling alternative, but existing Federated Multi-Objective Optimization (FMOO) methods face severe communication bottlenecks as their reliance on transmitting multiple gradients to a server is unscalable for large models. We introduce FIRM (Federated In-client Regularized Multi-objective alignment), a novel algorithm that achieves both client disagreement drift mitigation and communication efficiency. In FIRM, each client locally solves a regularized multi-objective optimization problem. By directly mitigating client disagreement drift through in-client regularization, our method eliminates the need for the multi-gradient transmissions common in prior works. Consequently, clients need only to transmit a single set of adapted parameters, maintaining high communication efficiency. We prove that our algorithm converges to Pareto-stationary points and, to our knowledge, provide the first finite-time convergence guarantees for this federated multi-objective alignment setting. Empirically, we show that FIRM leads to smoother training dynamics, reduced client disagreement drift, and improved reward trade-offs compared to baselines. We further propose a method to incorporate a preference over the objectives and report empirical Pareto plots, demonstrating that FIRM can smoothly adapt trade-offs between objectives in response to specified preferences.

FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models

TL;DR

This work introduces FIRM (Federated In-client Regularized Multi-objective alignment), a novel algorithm that achieves both client disagreement drift mitigation and communication efficiency and provides the first finite-time convergence guarantees for this federated multi-objective alignment setting.

Abstract

Aligning Large Language Models (LLMs) with human values often involves balancing multiple, conflicting objectives such as helpfulness and harmlessness. Training these models is computationally intensive, and centralizing the process raises significant data privacy concerns. Federated Learning (FL) offers a compelling alternative, but existing Federated Multi-Objective Optimization (FMOO) methods face severe communication bottlenecks as their reliance on transmitting multiple gradients to a server is unscalable for large models. We introduce FIRM (Federated In-client Regularized Multi-objective alignment), a novel algorithm that achieves both client disagreement drift mitigation and communication efficiency. In FIRM, each client locally solves a regularized multi-objective optimization problem. By directly mitigating client disagreement drift through in-client regularization, our method eliminates the need for the multi-gradient transmissions common in prior works. Consequently, clients need only to transmit a single set of adapted parameters, maintaining high communication efficiency. We prove that our algorithm converges to Pareto-stationary points and, to our knowledge, provide the first finite-time convergence guarantees for this federated multi-objective alignment setting. Empirically, we show that FIRM leads to smoother training dynamics, reduced client disagreement drift, and improved reward trade-offs compared to baselines. We further propose a method to incorporate a preference over the objectives and report empirical Pareto plots, demonstrating that FIRM can smoothly adapt trade-offs between objectives in response to specified preferences.

Paper Structure

This paper contains 60 sections, 11 theorems, 75 equations, 6 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

Under the specified assumptions, by choosing an appropriate step-size $\alpha$, the iterates produced by TFIRM satisfy: Proof: See Appendix F.

Figures (6)

  • Figure 1: Performance comparison between FIRM (orange) and the FedCMOO-A baseline (blue). All curves show mean performance across 8 clients. Panels (a,b): reward trajectories, smoothed with EMA (half-life=20), where FIRM achieves higher, more stable helpfulness with comparable harmlessness. Panels (c,d): MGDA weights, showing that FIRM yields smoother, more consistent trade-off decisions than FedCMOO-A.
  • Figure 2: Reward trajectories and MGDA weights under $\beta=0$ (orange) and $\beta=0.05$ (blue). All panels (a,b,c,d) are smoothed with EMA (half-life=20). Without regularization ($\beta=0$), harmlessness remains low and helpfulness plateaus near 0.46, while MGDA weights fluctuate erratically across clients (c,d). With $\beta=0.05$, FIRM achieves more favorable trade-offs and exhibits smoother, more consistent weight evolution, reducing client drift.
  • Figure 3: FIRM navigates the Helpfulness-Harmlessness trade-off. Each marker is a global model trained with a different preference vector $\mathbf{p}$.
  • Figure 4: Robustness of FIRM to Heterogeneous Reward Models. This figure compares a homogeneous setup (all clients use the "Same RMs") against a heterogeneous one ("Different RMs"). (a, b): The top row shows that the learned MGDA weights are remarkably stable, with nearly identical convergence dynamics in both settings. This confirms the robustness of our aggregation mechanism. (c, d): The bottom row shows that the resulting reward trajectories are highly competitive and closely matched. FIRM maintains strong performance on both helpfulness and harmlessness, demonstrating its stability even when faced with diverse client reward signals.
  • Figure 5: Robustness of FIRM to Non-IID Data Distribution. This figure compares the ideal IID setting with a challenging non-IID configuration (Dirichlet, $\alpha=0.3$), confirming FIRM's resilience to statistical heterogeneity. (a, b): The top row reveals that the learned MGDA weights are remarkably stable, with nearly identical convergence dynamics in both IID and non-IID settings. This highlights the robustness of our global aggregation method. (c, d): The bottom row shows the resulting reward trajectories. FIRM maintains a close harmlessness reward in both scenarios. While helpfulness performance is higher in the ideal IID case, the model still achieves gains under the non-IID setting, demonstrating effective and robust learning.
  • ...and 1 more figures

Theorems & Definitions (27)

  • Definition 1: Client MOMDP
  • Definition 2: Pareto Optimality
  • Definition 3: $\epsilon$-Pareto Stationarity
  • Theorem 1: Convergence of TFIRM
  • Remark 1: Controlling Disagreement Drift
  • Remark 2
  • Remark 3: Multi-Objective Disagreement Drift.
  • Lemma 1: Regularization Controls Disagreement
  • Lemma 2
  • Lemma 3: Critic Convergence Theorem 1 from xu2020improving
  • ...and 17 more