Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Xin Guo; Chunrui Zhao; Hong Jia; Ting Dang; Gongping Huang; Xianrui Zheng; Yan Gao

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Xin Guo, Chunrui Zhao, Hong Jia, Ting Dang, Gongping Huang, Xianrui Zheng, Yan Gao

Abstract

Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity, causing straggler effects under unified fine-tuning, while diverse downstream tasks require different representation depths, making full-model updates inefficient. To address these challenges, we propose an adaptive federated fine-tuning framework with early exits. Lightweight prediction heads are inserted at intermediate layers of the SSL backbone, allowing clients to terminate computation based on local constraints and task requirements. We further introduce a layer-wise, depth-aware partial aggregation strategy to better utilize representations from different network depths. Experiments show that the framework reduces edge overhead, supports heterogeneous hardware, and maintains competitive performance in resource-constrained federated environments.

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Abstract

Paper Structure (15 sections, 1 equation, 1 figure, 3 tables)

This paper contains 15 sections, 1 equation, 1 figure, 3 tables.

Introduction
Method
Multi-Exit Elastic Backbone
Resource and Task-Aware Local Training
Layer-wise Depth-aware Partial Aggregation
Experimental Setup
Datasets and Downstream Tasks
Model Architecture and Early-Exit Configuration
Federated Learning Environment and Heterogeneity
Experimental Results
Layer-wise Performance Analysis under Centralized and Federated Settings
Layer-wise Partial Aggregation for Heterogeneous Clients
Memory Cost under Varying Model Depth
Conclusion
Generative AI Use Disclosure

Figures (1)

Figure 1: Overview of the proposed Adaptive Federated Learning framework. Heterogeneous clients dynamically select the training depth based on local resources. The server employs a Depth-weighted Layer-wise Partial Aggregation strategy, where deeper layers are updated exclusively by high-resource clients.

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Abstract

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Authors

Abstract

Table of Contents

Figures (1)