Personalized Federated Instruction Tuning via Neural Architecture Search

Pengyu Zhang; Yingbo Zhou; Ming Hu; Junxian Feng; Jiawen Weng; Mingsong Chen

Personalized Federated Instruction Tuning via Neural Architecture Search

Pengyu Zhang, Yingbo Zhou, Ming Hu, Junxian Feng, Jiawen Weng, Mingsong Chen

TL;DR

This paper tackles the problem of data and resource heterogeneity in Federated Instruction Tuning (FIT) by introducing PerFIT, a neural-architecture-search-based framework that personalizes instruction tuning. Each client searches a personalized architecture within an expanded trainable space using iterative pruning and Taylor-based importance scores, then prunes back to the original parameter count, while backbone parameters remain frozen. A personalized, parameter-wise aggregation scheme for sparse LoRA modules enables effective cross-client information sharing under heterogeneous resources. The authors provide convergence analysis under standard assumptions and demonstrate up to 23% perplexity reduction on non-IID LLM benchmarks, validating PerFIT's effectiveness for personalized, efficient instruction tuning in federated settings.

Abstract

Federated Instruction Tuning (FIT) has shown the ability to achieve collaborative model instruction tuning among massive data owners without sharing private data. However, it still faces two key challenges, i.e., data and resource heterogeneity. Due to the varying data distribution and preferences among data owners, FIT cannot adapt to the personalized data of individual owners. Moreover, clients with superior computational abilities are constrained since they need to maintain the same fine-tuning architecture as the weaker clients. To address these issues, we propose a novel Personalized Federated Instruction Tuning (PerFIT) framework based on architecture search. Specifically, PerFIT allows each client to search for a personalized architecture by expanding the trainable parameter space of the global model followed by pruning the parameters to the original state. This procedure allows personalized instruction fine-tuning within expanded parameter spaces, concurrently preserving the same number of trainable parameters. Furthermore, to release the abilities of heterogeneous computational resources and enhance the performance of personalization on local data, we exploit personalized parameter-wise aggregation. The evaluation with multiple LLMs non-IID scenarios demonstrates that compared to the state-of-the-art FIT methods, our approach can achieve up to a 23% decrease in perplexity.

Personalized Federated Instruction Tuning via Neural Architecture Search

TL;DR

Abstract

Paper Structure (14 sections, 1 theorem, 11 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 14 sections, 1 theorem, 11 equations, 7 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Personalized Federated Learning
Neural Architecture Search (NAS)
Low-Rank Adapter (LoRA)
Methodology
Overview of PerFIT
Implementation Details
Convergence Analysis
Experiments
Experimental Settings
Performance Evaluation
Conclusion

Key Result

Theorem 1

(Convergence of PerFIT). Let $N$ and $S$ represent the number of local steps and the number of participants in each round, respectively. Given the aforementioned assumptions, assume that the learning rate $\eta\leq\frac{1}{16LN}$, the personalized fine-tuning modules $\Delta\tilde{\mathbf{\theta}}_{ where $\kappa={\frac{1}{2}}-150N^{3}\eta^{3}L^{3}-15N^{2}\eta^{2}L^{2}-5N\eta L$, $\rho=(25N^3\eta^

Figures (7)

Figure 1: Workflow of our personalized federated instruction tuning approach.
Figure 2: illustration of the personalized aggregation method.
Figure 3: Loss curves for homogeneous resources.
Figure 4: Loss curves for heterogeneous resources.
Figure 5: Comparison of different pruning metrics.
...and 2 more figures

Theorems & Definitions (1)

Theorem 1

Personalized Federated Instruction Tuning via Neural Architecture Search

TL;DR

Abstract

Personalized Federated Instruction Tuning via Neural Architecture Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)