Table of Contents
Fetching ...

FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

Haoran Zhang, Dongjun Kim, Seohyeon Cha, Haris Vikalo

TL;DR

FedRot-LoRA is proposed, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation, which preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity.

Abstract

Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations -- semantically equivalent updates can be represented in different latent subspaces across clients since $(B_i R_i)(R_i^\top A_i) = B_i A_i$. When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.

FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

TL;DR

FedRot-LoRA is proposed, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation, which preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity.

Abstract

Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations -- semantically equivalent updates can be represented in different latent subspaces across clients since . When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.
Paper Structure (23 sections, 4 theorems, 67 equations, 3 figures, 10 tables, 1 algorithm)

This paper contains 23 sections, 4 theorems, 67 equations, 3 figures, 10 tables, 1 algorithm.

Key Result

Theorem 4.4

Let Assumptions assumption:L, assumption:G, and assump:gauge hold, and let $f^*=\inf_W f(W)$. For a learning rate $\eta$, the following stationarity bound holds for the aggregated global model $W^t = W_0 + \bar{B}^t \bar{A}^t$ over $T$ communication rounds:

Figures (3)

  • Figure 1: Overview of rotational alignment in FedRot-LoRA. Top: Naive aggregation averages unaligned LoRA factors $(A_i, B_i)$, causing destructive interference due to cross-client subspace mismatch. Bottom: FedRot-LoRA applies client-specific rotations $R_i$ to align local updates prior to aggregation. This preserves the semantic update while reducing subspace misalignment, leading to lower aggregation error and more stable training behavior.
  • Figure 2: Optimization trajectories with target $\Delta W^* = 1.0$, initialized at $\bar{B}^0 = 0$ and $\bar{A}^0 = 0.44$ (black star). The red dashed curve denotes the optimal solution manifold. Each marker along a trajectory represents the global factors $(B, A)$ after one communication round under a given method. Different aggregation schemes induce distinct optimization paths, illustrating the impact of rotational misalignment on training dynamics.
  • Figure 3: Effect of soft rotation level $\lambda$ in FedRot-LoRA on MNLI and QQP. Red dashed line denotes the baseline's average accuracy.

Theorems & Definitions (8)

  • Theorem 4.4: Convergence Analysis
  • Theorem 4.8: Error Bound Analysis
  • Corollary 4.9: Feasible $\lambda$ for Strict Improvement
  • proof
  • Lemma 1.1: Soft Rotation Shrinkage
  • proof
  • proof
  • Remark 1.2: Rescaling Does Not Affect Rotational Alignment