Partial Federated Learning
Tiantian Feng, Anil Ramakrishna, Jimit Majmudar, Charith Peris, Jixuan Wang, Clement Chung, Richard Zemel, Morteza Ziyadi, Rahul Gupta
TL;DR
The paper tackles the challenge of heterogeneous data modalities in federated learning by introducing PartialFL, a framework that lets a subset of modalities (e.g., text) be shared with the server while others (e.g., audio) remain on-device. It combines a server-side encoder trained on the shareable modality, a global FL model trained on non-shareable data, and local edge models, augmented with cross-modal and embedding alignment losses to transfer knowledge across modalities and devices. The learning algorithm relies on asynchronous alternating minimization with contrastive objectives that avoid sharing labels, and experiments on SER and Food-101 datasets show that PartialFL outperforms standard FL and SL baselines and approaches centralized performance, highlighting robustness to data heterogeneity. The work advances privacy-preserving, multi-modal FL by enabling larger, better-aligned embeddings trained across distributed modalities, with practical considerations around privacy risks and future deployment.
Abstract
Federated Learning (FL) is a popular algorithm to train machine learning models on user data constrained to edge devices (for example, mobile phones) due to privacy concerns. Typically, FL is trained with the assumption that no part of the user data can be egressed from the edge. However, in many production settings, specific data-modalities/meta-data are limited to be on device while others are not. For example, in commercial SLU systems, it is typically desired to prevent transmission of biometric signals (such as audio recordings of the input prompt) to the cloud, but egress of locally (i.e. on the edge device) transcribed text to the cloud may be possible. In this work, we propose a new algorithm called Partial Federated Learning (PartialFL), where a machine learning model is trained using data where a subset of data modalities or their intermediate representations can be made available to the server. We further restrict our model training by preventing the egress of data labels to the cloud for better privacy, and instead use a contrastive learning based model objective. We evaluate our approach on two different multi-modal datasets and show promising results with our proposed approach.
