FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning

Yuxiang Lu; Suizhi Huang; Yuwen Yang; Shalayiding Sirejiding; Yue Ding; Hongtao Lu

FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning

Yuxiang Lu, Suizhi Huang, Yuwen Yang, Shalayiding Sirejiding, Yue Ding, Hongtao Lu

TL;DR

This work tackles HC-FMTL by relaxing model congruity in federated multi-task learning and proposing FedHCA2, which decouples encoder and decoder aggregation into Hyper Conflict-Averse and Hyper Cross Attention schemes, respectively, supplemented by learnable Hyper Aggregation Weights. Theoretical analysis links MTL and FL optimization and motivates conflict mitigation during encoder updates, while layer wise cross attention enables fine grained decoder interactions across heterogeneous tasks. Empirical results on PASCAL-Context and NYUD-v2 demonstrate that FedHCA2 outperforms traditional FL and FMTL baselines, with ablations confirming the necessity of both encoder and decoder aggregations and the adaptability provided by the hyper weights. The approach broadens the applicability of federated multi-task learning to realistic settings with diverse task setups and data domains, offering a flexible, scalable framework for personalized yet collaborative models across heterogeneous clients.

Abstract

Federated Learning (FL) enables joint training across distributed clients using their local data privately. Federated Multi-Task Learning (FMTL) builds on FL to handle multiple tasks, assuming model congruity that identical model architecture is deployed in each client. To relax this assumption and thus extend real-world applicability, we introduce a novel problem setting, Hetero-Client Federated Multi-Task Learning (HC-FMTL), to accommodate diverse task setups. The main challenge of HC-FMTL is the model incongruity issue that invalidates conventional aggregation methods. It also escalates the difficulties in accurate model aggregation to deal with data and task heterogeneity inherent in FMTL. To address these challenges, we propose the FedHCA$^2$ framework, which allows for federated training of personalized models by modeling relationships among heterogeneous clients. Drawing on our theoretical insights into the difference between multi-task and federated optimization, we propose the Hyper Conflict-Averse Aggregation scheme to mitigate conflicts during encoder updates. Additionally, inspired by task interaction in MTL, the Hyper Cross Attention Aggregation scheme uses layer-wise cross attention to enhance decoder interactions while alleviating model incongruity. Moreover, we employ learnable Hyper Aggregation Weights for each client to customize personalized parameter updates. Extensive experiments demonstrate the superior performance of FedHCA$^2$ in various HC-FMTL scenarios compared to representative methods. Our code will be made publicly available.

FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning

TL;DR

Abstract

framework, which allows for federated training of personalized models by modeling relationships among heterogeneous clients. Drawing on our theoretical insights into the difference between multi-task and federated optimization, we propose the Hyper Conflict-Averse Aggregation scheme to mitigate conflicts during encoder updates. Additionally, inspired by task interaction in MTL, the Hyper Cross Attention Aggregation scheme uses layer-wise cross attention to enhance decoder interactions while alleviating model incongruity. Moreover, we employ learnable Hyper Aggregation Weights for each client to customize personalized parameter updates. Extensive experiments demonstrate the superior performance of FedHCA

in various HC-FMTL scenarios compared to representative methods. Our code will be made publicly available.

Paper Structure (16 sections, 1 theorem, 13 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 1 theorem, 13 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Personalized Federated Learning
Multi-Task Learning
Federated Multi-Task Learning
Methodology
Preliminary
Architecture Overview
Hyper Conflict-Averse Aggregation
Hyper Cross Attention Aggregation
Hyper Aggregation Weights
Experiments
Experimental Setup
Main Results
Indepth Analysis
...and 1 more sections

Key Result

Theorem 1

Given clients with a shared encoder and task-specific decoder structure, the gradient descent in the shared encoder of MTL is equivalent to averaging parameter aggregation in FL, adding an extra term that maximizes the inner product of gradients between all pairs of tasks in each iteration.

Figures (7)

Figure 1: Comparison of different settings in FMTL. (a) Each client is dedicated to a single task. (b) Clients are grouped with peers, and peers in the same group share identical task setting. (c) Our proposed HC-FMTL setting that enables flexible collaboration among clients with different task setups.
Figure 2: Illustration of the HC-FMTL setting and our proposed FedHCA$^2$ framework. HC-FMTL enables clients to have different task setups, from single-task (e.g. client $C_1, C_2, C_3$) to multi-task (e.g. client $C_i, C_N$). HC-FMTL faces three main challenges: model incongruity due to different client model structures, data heterogeneity from different local data domains, and task heterogeneity from varied target tasks. The FL system includes a server and several clients. Our framework decomposes model aggregation into two parts: Hyper Conflict-Averse Aggregation for encoders and Hyper Cross Attention Aggregation for decoders. Learnable Hyper Aggregation Weights are employed to customize personalized parameter updates and are iteratively updated by local model updates from clients.
Figure 3: Comparison of optimization in MTL and FL. (a) The shared encoder in MTL is updated by gradient accumulation from all tasks. (b) The clients' encoders are updated independently and then aggregated in FL.
Figure 4: Evaluation results during training. (a) Parts from PASCAL-Context on single-task client. (b) Normals from NYUD-v2 on multi-task client.
Figure 5: The performance changes of different methods with the number of clients scaling to 2 and 4 times. '$\Delta_m$' is calculated w.r.t. corresponding local baseline of 1C, 2C, or 4C. When the number of clients increases, our method can consistently provide superior performance, and an overall growth trend could be observed.
...and 2 more figures

Theorems & Definitions (1)

Theorem 1: Difference in optimizing MTL and FL

FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning

TL;DR

Abstract

FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)