Table of Contents
Fetching ...

FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning

Peishen Yan, Yang Hua, Hao Wang, Jiaru Zhang, Xiaoyu Wu, Tao Song, Haibing Guan

TL;DR

FedMomentum is proposed, a novel framework that enables structured and momentum-preserving LoRA aggregation via singular value decomposition (SVD) and consistently outperforms prior state-of-the-art methods in convergence speed and final accuracy.

Abstract

Federated fine-tuning of large language models (LLMs) with low-rank adaptation (LoRA) offers a communication-efficient and privacy-preserving solution for task-specific adaptation. Naive aggregation of LoRA modules introduces noise due to mathematical incorrectness when averaging the downsampling and upsampling matrices independently. However, existing noise-free aggregation strategies inevitably compromise the structural expressiveness of LoRA, limiting its ability to retain client-specific adaptations by either improperly reconstructing the low-rank structure or excluding partially trainable components. We identify this problem as loss of training momentum, where LoRA updates fail to accumulate effectively across rounds, resulting in slower convergence and suboptimal performance. To address this, we propose FedMomentum, a novel framework that enables structured and momentum-preserving LoRA aggregation via singular value decomposition (SVD). Specifically, after aggregating low-rank updates in a mathematically correct manner, FedMomentum applies SVD to extract the dominant components that capture the main update directions. These components are used to reconstruct the LoRA modules with the same rank, while residual components can be retained and later merged into the backbone to preserve semantic information and ensure robustness. Extensive experiments across multiple tasks demonstrate that FedMomentum consistently outperforms prior state-of-the-art methods in convergence speed and final accuracy.

FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning

TL;DR

FedMomentum is proposed, a novel framework that enables structured and momentum-preserving LoRA aggregation via singular value decomposition (SVD) and consistently outperforms prior state-of-the-art methods in convergence speed and final accuracy.

Abstract

Federated fine-tuning of large language models (LLMs) with low-rank adaptation (LoRA) offers a communication-efficient and privacy-preserving solution for task-specific adaptation. Naive aggregation of LoRA modules introduces noise due to mathematical incorrectness when averaging the downsampling and upsampling matrices independently. However, existing noise-free aggregation strategies inevitably compromise the structural expressiveness of LoRA, limiting its ability to retain client-specific adaptations by either improperly reconstructing the low-rank structure or excluding partially trainable components. We identify this problem as loss of training momentum, where LoRA updates fail to accumulate effectively across rounds, resulting in slower convergence and suboptimal performance. To address this, we propose FedMomentum, a novel framework that enables structured and momentum-preserving LoRA aggregation via singular value decomposition (SVD). Specifically, after aggregating low-rank updates in a mathematically correct manner, FedMomentum applies SVD to extract the dominant components that capture the main update directions. These components are used to reconstruct the LoRA modules with the same rank, while residual components can be retained and later merged into the backbone to preserve semantic information and ensure robustness. Extensive experiments across multiple tasks demonstrate that FedMomentum consistently outperforms prior state-of-the-art methods in convergence speed and final accuracy.
Paper Structure (33 sections, 7 equations, 7 figures, 9 tables)

This paper contains 33 sections, 7 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Training loss curves of fine-tuning LLaMA2-7B on MetaMathQA with 10 clients under different LoRA update strategies. We compare various centralized fine-tuning and federated fine-tuning strategies under a unified setting: LoRA rank = 16, batch size = 16, local update steps = 10, and 100 iterations. The same experimental settings are adopted in Figure \ref{['fig:moti-loss']}.
  • Figure 2: (Left) Normalized singular value spectrum for the first LoRA module of the first training round. The elbow point indicates the effective rank $r$ of the matrix. (Right) Residual rank statistics across all LoRA modules throughout training rounds. The solid line represents the average residual rank, while the shaded area reflects the range between the maximum and minimum values.
  • Figure 3: Overview of the SVD-based aggregation process.
  • Figure 4: Training loss for math reasoning task.
  • Figure 5: Statistical analysis for the aggregated updates of FedMomentum.
  • ...and 2 more figures