Table of Contents
Fetching ...

Communication-Efficient Personalized Adaptation via Federated-Local Model Merging

Yinan Zou, Md Kamran Chowdhury Shisher, Christopher G. Brinton, Vishrant Tripathi

TL;DR

Potara is proposed, a principled framework for federated personalization that constructs a personalized model for each client by merging two complementary models: a federated model capturing general knowledge, and a local model capturing personalized knowledge.

Abstract

Parameter-efficient fine-tuning methods, such as LoRA, offer a practical way to adapt large vision and language models to client tasks. However, this becomes particularly challenging under task-level heterogeneity in federated deployments. In this regime, personalization requires balancing general knowledge with personalized knowledge, yet existing approaches largely rely on heuristic mixing rules and lack theoretical justification. Moreover, prior model merging approaches are also computation and communication intensive, making the process inefficient in federated settings. In this work, we propose Potara, a principled framework for federated personalization that constructs a personalized model for each client by merging two complementary models: (i) a federated model capturing general knowledge, and (ii) a local model capturing personalized knowledge. Through the construct of linear mode connectivity, we show that the expected task loss admits a variance trace upper bound, whose minimization yields closed-form optimal mixing weights that guarantee a tighter bound for the merged model than for either the federated or local model alone. Experiments on vision and language benchmarks show that Potara consistently improves personalization while reducing communication, leading to a strong performance-communication trade-off.

Communication-Efficient Personalized Adaptation via Federated-Local Model Merging

TL;DR

Potara is proposed, a principled framework for federated personalization that constructs a personalized model for each client by merging two complementary models: a federated model capturing general knowledge, and a local model capturing personalized knowledge.

Abstract

Parameter-efficient fine-tuning methods, such as LoRA, offer a practical way to adapt large vision and language models to client tasks. However, this becomes particularly challenging under task-level heterogeneity in federated deployments. In this regime, personalization requires balancing general knowledge with personalized knowledge, yet existing approaches largely rely on heuristic mixing rules and lack theoretical justification. Moreover, prior model merging approaches are also computation and communication intensive, making the process inefficient in federated settings. In this work, we propose Potara, a principled framework for federated personalization that constructs a personalized model for each client by merging two complementary models: (i) a federated model capturing general knowledge, and (ii) a local model capturing personalized knowledge. Through the construct of linear mode connectivity, we show that the expected task loss admits a variance trace upper bound, whose minimization yields closed-form optimal mixing weights that guarantee a tighter bound for the merged model than for either the federated or local model alone. Experiments on vision and language benchmarks show that Potara consistently improves personalization while reducing communication, leading to a strong performance-communication trade-off.
Paper Structure (36 sections, 4 theorems, 55 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 36 sections, 4 theorems, 55 equations, 4 figures, 9 tables, 1 algorithm.

Key Result

Theorem 3.6

Under Definition LMC-basin, Assumption assump:basin_event, smoothness, assump:L_H, and joint_Gaussian, for $i\in\{\mathrm{FedIT},\mathrm{Local},\mathrm{Merge}\}$, it holds that

Figures (4)

  • Figure 1: Illustration of Potara. We train a FedIT model via FL and train local fine-tuning models. We obtain the final personalized model by weighted merging these FedIT and local models.
  • Figure 2: Performance-communication trade-off across different benchmarks. Potara ($t$) denotes our method where the FedIT model obtained at communication round $t$ is used for merging. Potara achieves strong performance with substantially lower communication overhead than baselines.
  • Figure 3: Ablation studies demonstrating the performance of Potara.
  • Figure 4: Client performance under various mixing weights. For each client, we construct a personalized model by merging a FedIT checkpoint at round 50,150, or 300 with a fixed Local model at round 300, and sweep the local mixing weight $\lambda_{\text{FedIT}}$ (x-axis). Curves report the resulting client accuracy. The red dashed line marks the mixing weight calculated by our method for each client.

Theorems & Definitions (8)

  • Definition 2.1
  • Definition 3.1
  • Theorem 3.6
  • Lemma 3.7
  • Lemma 1.1
  • proof
  • Lemma 1.2
  • proof