HiLoRA: Hierarchical Low-Rank Adaptation for Personalized Federated Learning

Zihao Peng; Nan Zou; Jiandian Zeng; Guo Li; Ke Chen; Boyuan Li; Tian Wang

HiLoRA: Hierarchical Low-Rank Adaptation for Personalized Federated Learning

Zihao Peng, Nan Zou, Jiandian Zeng, Guo Li, Ke Chen, Boyuan Li, Tian Wang

TL;DR

HiLoRA is proposed, a hierarchical LoRA framework that places adapters at three levels: root, cluster, and leaf, each designed to capture global, subgroup, and client-specific knowledge, respectively, and develops a LoRA-Subspace Adaptive Clustering mechanism that infers latent client groups via subspace similarity analysis, thereby facilitating knowledge sharing across structurally aligned clients.

Abstract

Vision Transformers (ViTs) have been widely adopted in vision tasks due to their strong transferability. In Federated Learning (FL), where full fine-tuning is communication heavy, Low-Rank Adaptation (LoRA) provides an efficient and communication-friendly way to adapt ViTs. However, existing LoRA-based federated tuning methods overlook latent client structures in real-world settings, limiting shared representation learning and hindering effective adaptation to unseen clients. To address this, we propose HiLoRA, a hierarchical LoRA framework that places adapters at three levels: root, cluster, and leaf, each designed to capture global, subgroup, and client-specific knowledge, respectively. Through cross-tier orthogonality and cascaded optimization, HiLoRA separates update subspaces and aligns each tier with its residual personalized objective. In particular, we develop a LoRA-Subspace Adaptive Clustering mechanism that infers latent client groups via subspace similarity analysis, thereby facilitating knowledge sharing across structurally aligned clients. Theoretically, we establish a tier-wise generalization analysis that supports HiLoRA's design. Experiments on ViT backbones with CIFAR-100 and DomainNet demonstrate consistent improvements in both personalization and generalization.

HiLoRA: Hierarchical Low-Rank Adaptation for Personalized Federated Learning

TL;DR

Abstract

Paper Structure (18 sections, 2 theorems, 16 equations, 5 figures, 4 tables)

This paper contains 18 sections, 2 theorems, 16 equations, 5 figures, 4 tables.

Introduction
Preliminaries
LoRA Recap
Three-Level LoRA
Federated Learning Setup
The HiLoRA Framework
Hierarchical Orthogonal LoRA Decomposition
LoRA-Subspace Adaptive Clustering
Cascaded Tier-wise Optimization
Personalization and Generalization in HiLoRA
Theoretical Guarantees
Experiments
Experimental Setup
Performance Evaluation
Label-Heterogeneous Federated Setting
...and 3 more sections

Key Result

Theorem 1

Under Assumption ass:orth, for any client $i$ and $\delta\in(0,1)$, with a probability of at least $1-3\delta$,

Figures (5)

Figure 1: Challenges of Dual-LoRA. Six clients own non-IID data that form three latent clusters (vehicles, insects, fruits). Global LoRA forces a “one-size-fits-all” adapter, causing (a) gradient drift and (b) loss of cluster-level cues. Fully personalized LoRA removes sharing but (c) overfits scarce local data. These limitations motivate a hierarchical LoRA design.
Figure 2: HiLoRA overview.Top-right: cascaded tier-wise optimization with progressive freezing. (A) Train the Root-LoRA as the global adapter; (B) Perform LoRA-Subspace Adaptive Clustering to identify client communities. (C) Update the Cluster-LoRA with an orthogonality constraint to the frozen root; (D) Adapt the Leaf-LoRA orthogonal to both root and cluster tiers. Each client $i$ with cluster index $j=j(i)$ updates its effective LoRA as $\Delta\mathbf{W}_{i}=\mathbf{B}_{r}\mathbf{A}_{r}+\mathbf{B}_{c,j}\mathbf{A}_{c,j}+\mathbf{B}_{\ell,i}\mathbf{A}_{\ell,i}$.
Figure 3: t-SNE of clustering results on CIFAR-100. Clients are visualized by Jensen–Shannon distances of their label distributions: (a) Patho-10, $K^{\star}{=}10$; (b) SC–Dir($\alpha{=}3$), $K^{\star}{=}20$.
Figure 4: Principal-angle distributions between tiers in HiLoRA. Computed from the column spaces of the LoRA-$\mathbf{B}$ matrices. Lower $\cos^2\theta$ indicates stronger subspace orthogonality.
Figure 5: Unseen-client accuracy (mean ± std) across adaptation epochs on (a) CIFAR-100 and (b) DomainNet. The starred point indicates the Root+Cluster initialization, while subsequent epochs correspond to leaf-level LoRA adaptation.

Theorems & Definitions (2)

Theorem 1: HiLoRA Excess-Risk Generalization Bound
Corollary 1

HiLoRA: Hierarchical Low-Rank Adaptation for Personalized Federated Learning

TL;DR

Abstract

HiLoRA: Hierarchical Low-Rank Adaptation for Personalized Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)