Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing

Kunjal Panchal; Sunav Choudhary; Nisarg Parikh; Lijun Zhang; Hui Guan

Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing

Kunjal Panchal, Sunav Choudhary, Nisarg Parikh, Lijun Zhang, Hui Guan

TL;DR

Flow addresses non-IID heterogeneity in federated learning by introducing per-instance routing between a global model and a client-specific local model. It constructs a dynamic personalized model $w_p$ per client using a routing module $\psi_g$ that decides, for each input, whether to use $w_g$ or $w_\ell$, with data split into $\zeta_{m,\ell}$ and $\zeta_{m,g}$ and FedAvg-based server aggregation. The approach includes an explicit convergence analysis for both global and personalized models and demonstrates, through extensive cross-domain experiments on language and vision tasks, that Flow improves both generalized and personalized accuracy while remaining scalable and friendly to new clients. Together, these results indicate that per-instance dynamic routing can meaningfully enhance personalization in large-scale, cross-device FL with practical deployment benefits.

Abstract

Personalization in Federated Learning (FL) aims to modify a collaboratively trained global model according to each client. Current approaches to personalization in FL are at a coarse granularity, i.e. all the input instances of a client use the same personalized model. This ignores the fact that some instances are more accurately handled by the global model due to better generalizability. To address this challenge, this work proposes Flow, a fine-grained stateless personalized FL approach. Flow creates dynamic personalized models by learning a routing mechanism that determines whether an input instance prefers the local parameters or its global counterpart. Thus, Flow introduces per-instance routing in addition to leveraging per-client personalization to improve accuracies at each client. Further, Flow is stateless which makes it unnecessary for a client to retain its personalized state across FL rounds. This makes Flow practical for large-scale FL settings and friendly to newly joined clients. Evaluations on Stackoverflow, Reddit, and EMNIST datasets demonstrate the superiority in prediction accuracy of Flow over state-of-the-art non-personalized and only per-client personalized approaches to FL.

Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing

TL;DR

Flow addresses non-IID heterogeneity in federated learning by introducing per-instance routing between a global model and a client-specific local model. It constructs a dynamic personalized model

per client using a routing module

that decides, for each input, whether to use

, with data split into

and

and FedAvg-based server aggregation. The approach includes an explicit convergence analysis for both global and personalized models and demonstrates, through extensive cross-domain experiments on language and vision tasks, that Flow improves both generalized and personalized accuracy while remaining scalable and friendly to new clients. Together, these results indicate that per-instance dynamic routing can meaningfully enhance personalization in large-scale, cross-device FL with practical deployment benefits.

Abstract

Paper Structure (43 sections, 17 theorems, 119 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 43 sections, 17 theorems, 119 equations, 14 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Our Approach
Local Parameters.
Routing Module.
Global Parameters.
Soft versus Hard Policy.
Theoretical Analysis
Discussion.
Discussion.
Experiments and Results
Performance Comparison
Ablation Studies
Conclusion
Limitations, Future Work, and Broader Impact
...and 28 more sections

Key Result

Theorem 4.1

If each client's objective function $f_m$ (and hence the global objective function $F$) satisfies $\beta$-smoothness, $\sigma_\ell$-bounded local gradient variance, $(G,B)$-dissimilarity assumptions, using the learning rate $\frac{1}{2\beta} \leq \eta_\ell \leq \frac{1}{2 \sqrt{5} \beta B K^2 \sqrt{

Figures (14)

Figure 2: $w_{g}$ and $w_{p}$ accuracies for Stackoverflow.
Figure 3: Behavior of the routing policy from $\psi_{g}$ for all instances at each layer for Stackoverflow.
Figure 4: Ablation studies on Stackoverflow dataset.
Figure 5: Learning curves on Generalized Accuracy Metric of Flow and its baselines.
Figure 6: Learning curves on Personalized Accuracy Metric of Flow and its baselines.
...and 9 more figures

Theorems & Definitions (33)

Theorem 4.1: Convergence of the Global Model
Theorem 4.2: Convergence of the Personalized Model
Definition E.5: Gradient Diversity
Lemma E.6: Local model progress
proof
Lemma E.7: Local version of the global model progress
proof
Lemma E.8: Deviation of the personalized model from the global model
proof
Theorem E.9: Convergence of the Global Model for Convex Cases
...and 23 more

Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing

TL;DR

Abstract

Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (33)