FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

Tong Xia; Abhirup Ghosh; Xinchi Qiu; Cecilia Mascolo

FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

Tong Xia, Abhirup Ghosh, Xinchi Qiu, Cecilia Mascolo

TL;DR

FLea tackles the twin FL challenges of data scarcity and label skew by sharing privacy-preserving intermediate activations through a global feature buffer and augmenting local training with representation-space mix-ups. It integrates a distillation term to curb local drift and a distance-correlation loss to reduce leakage, while updating the global model via FedAvg-style aggregation. Empirical results across image, audio, and sensor datasets show FLea outperforms 13 of 18 baselines by more than 5% in accuracy in many settings and significantly mitigates privacy risks associated with feature sharing. The method offers a practical balance between improving global performance under data constraints and preserving client privacy in cross-device FL.

Abstract

Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to these challenges, we propose a pioneering framework called \textit{FLea}, incorporating the following key components: \textit{i)} A global feature buffer that stores activation-target pairs shared from multiple clients to support local training. This design mitigates local model drift caused by the absence of certain classes; \textit{ii)} A feature augmentation approach based on local and global activation mix-ups for local training. This strategy enlarges the training samples, thereby reducing the risk of local overfitting; \textit{iii)} An obfuscation method to minimize the correlation between intermediate activations and the source data, enhancing the privacy of shared features. To verify the superiority of \textit{FLea}, we conduct extensive experiments using a wide range of data modalities, simulating different levels of local data scarcity and label skew. The results demonstrate that \textit{FLea} consistently outperforms state-of-the-art FL counterparts (among 13 of the experimented 18 settings, the improvement is over $5\%$) while concurrently mitigating the privacy vulnerabilities associated with shared features. Code is available at https://github.com/XTxiatong/FLea.git

FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

TL;DR

Abstract

) while concurrently mitigating the privacy vulnerabilities associated with shared features. Code is available at https://github.com/XTxiatong/FLea.git

Paper Structure (32 sections, 11 equations, 13 figures, 6 tables)

This paper contains 32 sections, 11 equations, 13 figures, 6 tables.

Introduction
Background
Fundamentals of FL
Addressing label skew in FL
Data scarcity in FL
Performance decline caused by data scarcity
Understanding the effect of data scarcity
Privacy-preserving feature sharing
FLea
Overview
Formulation of feature buffer
Client local training
Feature augmentation
Local training objective
Model aggregation and buffer updating
...and 17 more sections

Figures (13)

Figure 1: Edge devices as clients in federated learning, where local data exhibits label skew (presented by different markers) and scarcity (usually very small in size).
Figure 2: Performance of FL methods with increasing data scarcity levels (A smaller $|\mathcal{D}_k|$ indicates a heavier scarcity).
Figure 3: T-SNE for low-dimension features where the color distinguishes classes and the class separation measurement DB under different numbers of training samples.
Figure 4: Data augmentations. From (a) to (c), the privacy vulnerability is reduced. (b) is the average of a batch of samples like (a), but if the local data contains individual context information (e.g., (a*)), averaging over those samples cannot protect such information (e.g., (b*)). (c) shows a feature of (a*) and (c*) shows its reconstruction.
Figure 5: Overview of FLea for $t$-th communication round.
...and 8 more figures

FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

TL;DR

Abstract

FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (13)