Table of Contents
Fetching ...

FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation

Fatema Siddika, Md Anwar Hossen, J. Pablo Muñoz, Tanya Roosta, Anuj Sharma, Ali Jannesari

TL;DR

FedReFT introduces a federated representation-fine-tuning framework that personalizes hidden-representation interventions through sparse, low-rank components and robust All-But-Me aggregation based on the geometric median. An adaptive, TTC-inspired mixing strategy balances local client-specific updates with global ABM knowledge, mitigating semantic misalignment under data/task heterogeneity. Empirical results across commonsense and arithmetic reasoning and GLUE demonstrate state-of-the-art accuracy with orders-of-magnitude reductions in trainable parameters, highlighting practical edge-device applicability. The approach advances privacy-conscious, communication-efficient FL by shifting tuning to representation space and employing robust aggregation, with broad potential for extension to other modalities and privacy-preserving enhancements.

Abstract

Parameter-efficient fine-tuning (PEFT) adapts large pre-trained models by updating only a small subset of parameters. Recently, Representation Fine-Tuning (ReFT) has emerged as an effective alternative. ReFT shifts the fine-tuning paradigm from updating model weights to directly manipulating hidden representations that capture rich semantic information, and outperforms state-of-the-art PEFTs in standalone settings. However, its application in Federated Learning (FL) remains challenging due to heterogeneity in clients' data distributions, model capacities, and computational resources. To address these challenges, we introduce Federated Representation Fine-Tuning (FedReFT), a novel approach to fine-tune clients' hidden representations. FedReFT applies sparse intervention layers to steer hidden representations directly, offering a lightweight and semantically rich fine-tuning alternative ideal for edge devices. However, representation-level updates are especially vulnerable to aggregation mismatch under different task heterogeneity, where naive averaging can corrupt semantic alignment. To mitigate this issue, we propose All-But-Me (ABM) aggregation, where each client receives the aggregated updates of others and partially incorporates them, enabling stable and personalized learning by balancing local focus with global knowledge. We further design an adaptive update strategy inspired by Test-Time Computing (TTC) to balance local and global contributions under heterogeneous conditions. FedReFT achieves state-of-the-art performance on commonsense reasoning, arithmetic reasoning, and GLUE benchmarks, while delivering 1-49 times higher parameter efficiency compared to leading LoRA-based methods.

FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation

TL;DR

FedReFT introduces a federated representation-fine-tuning framework that personalizes hidden-representation interventions through sparse, low-rank components and robust All-But-Me aggregation based on the geometric median. An adaptive, TTC-inspired mixing strategy balances local client-specific updates with global ABM knowledge, mitigating semantic misalignment under data/task heterogeneity. Empirical results across commonsense and arithmetic reasoning and GLUE demonstrate state-of-the-art accuracy with orders-of-magnitude reductions in trainable parameters, highlighting practical edge-device applicability. The approach advances privacy-conscious, communication-efficient FL by shifting tuning to representation space and employing robust aggregation, with broad potential for extension to other modalities and privacy-preserving enhancements.

Abstract

Parameter-efficient fine-tuning (PEFT) adapts large pre-trained models by updating only a small subset of parameters. Recently, Representation Fine-Tuning (ReFT) has emerged as an effective alternative. ReFT shifts the fine-tuning paradigm from updating model weights to directly manipulating hidden representations that capture rich semantic information, and outperforms state-of-the-art PEFTs in standalone settings. However, its application in Federated Learning (FL) remains challenging due to heterogeneity in clients' data distributions, model capacities, and computational resources. To address these challenges, we introduce Federated Representation Fine-Tuning (FedReFT), a novel approach to fine-tune clients' hidden representations. FedReFT applies sparse intervention layers to steer hidden representations directly, offering a lightweight and semantically rich fine-tuning alternative ideal for edge devices. However, representation-level updates are especially vulnerable to aggregation mismatch under different task heterogeneity, where naive averaging can corrupt semantic alignment. To mitigate this issue, we propose All-But-Me (ABM) aggregation, where each client receives the aggregated updates of others and partially incorporates them, enabling stable and personalized learning by balancing local focus with global knowledge. We further design an adaptive update strategy inspired by Test-Time Computing (TTC) to balance local and global contributions under heterogeneous conditions. FedReFT achieves state-of-the-art performance on commonsense reasoning, arithmetic reasoning, and GLUE benchmarks, while delivering 1-49 times higher parameter efficiency compared to leading LoRA-based methods.

Paper Structure

This paper contains 37 sections, 18 equations, 5 figures, 20 tables.

Figures (5)

  • Figure 1: Average accuracy vs. trainable parameters (%) for federated PEFT methods on Arithmetic, Commonsense, and GLUE benchmarks using LLaMA-3 8B, LLaMA-3.2B, and RoBERTa-large models, respectively. FedReFT attains state-of-the-art accuracy while training far fewer parameters, improving communication efficiency and reducing transmission cost in FL.
  • Figure 2: FedReFT with ABM Aggregation. Clients cross-task demonstrate personalization while maintaining alignment with the global representation. (1)-(2): Each client applies LoReFTWu2024 interventions to train learnable parameter {$W$, $R$, $b$} to modify hidden representations $h$ in a low-rank edit subspace. (3): Clients fine-tune {$W$, $R$, $b$} locally and partially fuse received All-But-Me aggregated updates with their own. (4): The server performs ABM aggregation using the geometric median over other clients’ intervention parameters to generate $W_k^{\text{ABM}}, R_k^{\text{ABM}}, b_k^{\text{ABM}}$.
  • Figure 3: Study_Distance_from_GeoMed_ShapeColor
  • Figure 4: Study_Distance_from_Mean_ShapeColor
  • Figure 5: Comparison of aggregation strategies across tasks. Results for FedAvg, Mean-ABM, and Geometric Median-ABM on Commonsense Reasoning, Arithmetic Reasoning, and GLUE show that Geometric Median-ABM consistently outperforms others, demonstrating greater robustness in heterogeneous federated settings.