Table of Contents
Fetching ...

Personalized Federated Fine-tuning for Heterogeneous Data: An Automatic Rank Learning Approach via Two-Level LoRA

Jie Hao, Yuman Wu, Ali Payani, Myungjin Lee, Mingrui Liu

TL;DR

PF2LoRA introduces a two-level low-rank adaptation for personalized federated fine-tuning on heterogeneous data, combining a common adapter with a lightweight client-specific adapter to automatically learn per-client ranks within a flexible range. Framed as a bilevel optimization problem, the upper level optimizes a shared adapter while the lower level personalizes per client, enabling data-driven rank adaptation and reduced hyperparameter tuning. Empirical results on NLU and NLG benchmarks show PF2LoRA consistently outperforms HOMLoRA, Per-FedAvg-LoRA, and HETLoRA with minimal memory overhead, and a synthetic study provides theoretical and empirical justification for automatic rank discovery. The work advances practical federated fine-tuning of foundation models by balancing personalization with efficiency, backed by convergence guarantees in a simplified setting and robust experimental validation.

Abstract

We study the task of personalized federated fine-tuning with heterogeneous data in the context of language models, where clients collaboratively fine-tune a language model (e.g., BERT, GPT) without sharing their local data, achieving personalization simultaneously. While recent efforts have applied parameter-efficient fine-tuning techniques like low-rank adaptation (LoRA) in federated settings, they typically use single or multiple independent low-rank adapters with predefined maximal and minimal ranks, which may not be optimal for diverse data sources over clients. To address this issue, we propose PF2LoRA, a new personalized federated fine-tuning algorithm built on a novel \emph{automatic rank learning approach via two-level LoRA}. Given the pretrained language model whose weight is frozen, our algorithm aims to learn two levels of adaptation simultaneously: the first level aims to learn a common adapter for all clients, while the second level fosters individual client personalization. A key advantage of PF2LoRA is its ability to adaptively determine a suitable rank based on an individual client's data, rather than relying on a predefined rank that is agnostic to data heterogeneity. We present a synthetic example that highlights how PF2LoRA automatically learns the ground-truth rank for each client, tailoring the adaptation to match the properties of their individual data. Notably, this approach introduces minimal additional memory overhead, as the second-level adaptation comprises a small number of parameters compared to the first level. Our experiments on natural language understanding and generation tasks demonstrate that PF2LoRA significantly outperforms existing federated fine-tuning methods.

Personalized Federated Fine-tuning for Heterogeneous Data: An Automatic Rank Learning Approach via Two-Level LoRA

TL;DR

PF2LoRA introduces a two-level low-rank adaptation for personalized federated fine-tuning on heterogeneous data, combining a common adapter with a lightweight client-specific adapter to automatically learn per-client ranks within a flexible range. Framed as a bilevel optimization problem, the upper level optimizes a shared adapter while the lower level personalizes per client, enabling data-driven rank adaptation and reduced hyperparameter tuning. Empirical results on NLU and NLG benchmarks show PF2LoRA consistently outperforms HOMLoRA, Per-FedAvg-LoRA, and HETLoRA with minimal memory overhead, and a synthetic study provides theoretical and empirical justification for automatic rank discovery. The work advances practical federated fine-tuning of foundation models by balancing personalization with efficiency, backed by convergence guarantees in a simplified setting and robust experimental validation.

Abstract

We study the task of personalized federated fine-tuning with heterogeneous data in the context of language models, where clients collaboratively fine-tune a language model (e.g., BERT, GPT) without sharing their local data, achieving personalization simultaneously. While recent efforts have applied parameter-efficient fine-tuning techniques like low-rank adaptation (LoRA) in federated settings, they typically use single or multiple independent low-rank adapters with predefined maximal and minimal ranks, which may not be optimal for diverse data sources over clients. To address this issue, we propose PF2LoRA, a new personalized federated fine-tuning algorithm built on a novel \emph{automatic rank learning approach via two-level LoRA}. Given the pretrained language model whose weight is frozen, our algorithm aims to learn two levels of adaptation simultaneously: the first level aims to learn a common adapter for all clients, while the second level fosters individual client personalization. A key advantage of PF2LoRA is its ability to adaptively determine a suitable rank based on an individual client's data, rather than relying on a predefined rank that is agnostic to data heterogeneity. We present a synthetic example that highlights how PF2LoRA automatically learns the ground-truth rank for each client, tailoring the adaptation to match the properties of their individual data. Notably, this approach introduces minimal additional memory overhead, as the second-level adaptation comprises a small number of parameters compared to the first level. Our experiments on natural language understanding and generation tasks demonstrate that PF2LoRA significantly outperforms existing federated fine-tuning methods.

Paper Structure

This paper contains 41 sections, 10 theorems, 65 equations, 5 figures, 22 tables, 1 algorithm.

Key Result

Theorem 7.2

Suppose Assumption ass:bilevel holds. Define the smoothness parameter $L_\Phi=L_{f,1}+\frac{L_{f,1}^2}{\mu}$, and choose $\alpha=\frac{1}{4L_{f,1}}, \eta=\min\left(\frac{\mu^2}{5L_{f,1}^3\sqrt{(\frac{4L_{f,1}}{\mu}-\frac{\mu}{4L_{f,1}})}}, \frac{1}{8L_\Phi}, \sqrt{\frac{1}{16N}}, \sqrt[3]{\frac{1}{8

Figures (5)

  • Figure 1: Overview of the two-level low-rank adaptation framework. The first level learns a common adapter $\{A, B\}$ for all clients, and the common adapter is synchronized by averaging across all the clients at every communication round. The second level aims to learn a client-specific and lightweight adapter $\{C_k, D_k\}$ for a specific client $k\in[1, M]$, while the lightweight adapters introduce negligible additional memory overhead.
  • Figure 2: Comparison of two algorithms. Left to right: the training loss on two clients, the testing loss on two clients, Frobenius norm distance $\|W_k - W_k^*\|_F$, $k = 1, 2$, and the rank evolution of two clients.
  • Figure 3: Performance comparison with/without bilevel optimization (BO). We report "Matthew's correlation" for CoLA and "Accuracy" for MNLI, SST-2, QQP and QNLI. Higher score means "better performance"
  • Figure 4: The averaged training loss and perplexity on natural language generation task of WebNLG.
  • Figure 5: Sensitivity analysis of hyperparameters.

Theorems & Definitions (18)

  • Theorem 7.2: Convergence Guarantees
  • Lemma 9.1: gradient descent for strongly convex and smooth functions
  • proof
  • Lemma 9.2: true hypergradient
  • proof
  • Lemma 9.3: Lipschitz property ghadimi2018approximation
  • Lemma 9.4: Lipschitz hypergradient
  • proof
  • Lemma 9.5: Hypergradient bias
  • proof
  • ...and 8 more