Table of Contents
Fetching ...

FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning

Jieming Bian, Lei Wang, Letian Zhang, Jie Xu

Abstract

Federated Learning (FL) with Low-Rank Adaptation (LoRA) has become a standard for privacy-preserving LLM fine-tuning. However, existing personalized methods predominantly operated under a restrictive Flat-Model Assumption: they addressed client-side \textit{statistical heterogeneity} but treated the model as a monolithic block, ignoring the \textit{functional heterogeneity} across LLM layers. We argue that these two statistical (horizontal) and functional (vertical) dimensions, are \textit{orthogonal in source yet coupled in interaction}, implying that the optimal depth of parameter sharing is functionally dependent on client similarity. To address this, we propose \textbf{FedTreeLoRA}, a framework employing tree-structured aggregation for fine-grained, layer-wise alignment. By dynamically constructing an aggregation hierarchy, FedTreeLoRA allows clients to share broad consensus on shallow `trunks' while progressively specializing on deep `branches'. Experiments on NLU and NLG benchmarks demonstrate that FedTreeLoRA significantly outperforms state-of-the-art methods by effectively reconciling generalization and personalization.

FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning

Abstract

Federated Learning (FL) with Low-Rank Adaptation (LoRA) has become a standard for privacy-preserving LLM fine-tuning. However, existing personalized methods predominantly operated under a restrictive Flat-Model Assumption: they addressed client-side \textit{statistical heterogeneity} but treated the model as a monolithic block, ignoring the \textit{functional heterogeneity} across LLM layers. We argue that these two statistical (horizontal) and functional (vertical) dimensions, are \textit{orthogonal in source yet coupled in interaction}, implying that the optimal depth of parameter sharing is functionally dependent on client similarity. To address this, we propose \textbf{FedTreeLoRA}, a framework employing tree-structured aggregation for fine-grained, layer-wise alignment. By dynamically constructing an aggregation hierarchy, FedTreeLoRA allows clients to share broad consensus on shallow `trunks' while progressively specializing on deep `branches'. Experiments on NLU and NLG benchmarks demonstrate that FedTreeLoRA significantly outperforms state-of-the-art methods by effectively reconciling generalization and personalization.
Paper Structure (41 sections, 1 theorem, 57 equations, 8 figures, 19 tables, 1 algorithm)

This paper contains 41 sections, 1 theorem, 57 equations, 8 figures, 19 tables, 1 algorithm.

Key Result

Theorem 5.1

Let Assumptions ass:smooth--ass:param_align hold. Let $E$ be the number of local SGD steps and choose stepsize $\eta>0$. Define the composite constant $\Gamma$ collecting all $O(\eta^2)$ terms from local updates and tree-structured aggregation as: for some constant $C>0$. The average squared gradient norm of the iterates generated by FedTreeLoRA satisfies where $\Delta$ denotes the initial optim

Figures (8)

  • Figure 1: Vertical Heterogeneity. Aggregating only shallow layers significantly outperforms aggregating deep layers.
  • Figure 2: The Coupling Effect of Dual Heterogeneity. As client distributions diverge (from Homogeneous to Heterogeneous), the optimal sharing boundary shifts from deep to shallow layers.
  • Figure 3: Overview of FedTreeLoRA. (1) Global Topological Structure Modeling: A hierarchy tree is built via AHC on client LoRA $B$ matrices during warmup to capture cross-client relationships. (2) Adaptive Layer-wise Alignment: For each layer $l$, the optimal cluster count $c_l^*$ is dynamically selected under a monotonicity constraint. (3) Cluster-External Expert Mechanism: Each client synthesizes parameters by mixing a Cluster Expert with an External Expert via a learnable coefficient $\lambda_{l,k}$.
  • Figure 4: Layer-wise cluster counts ($c_l^*$) across different datasets. FedTreeLoRA adaptively identifies the optimal aggregation granularity specific to each data distribution.
  • Figure 5: Extended Motivational Studies. (a) Substantiates Observation 1 (Vertical Heterogeneity) on SST2 and QQP datasets. (b)--(d) Substantiate Observation 2 (Coupling Effect) across different tasks, confirming that the optimal sharing boundary consistently shifts towards shallower layers as data heterogeneity increases.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 5.1