Table of Contents
Fetching ...

Benchmarking Differential Privacy and Federated Learning for BERT Models

Priyam Basu, Tiasa Singha Roy, Rakshit Naidu, Zumrut Muftuoglu, Sahib Singh, Fatemehsadat Mireshghallah

TL;DR

The paper benchmarks Differential Privacy and Federated Learning for BERT-family models on Twitter data related to depression and harassment, comparing centralized DP, FL, and DP-FL under IID and Non-IID data. It analyzes multiple architectures (BERT, RoBERTa, DistilBERT, ALBERT) across privacy budgets, revealing that smaller models are more robust to DP, and that Non-IID FL presents greater utility degradation. The study provides actionable insights into privacy-utility trade-offs for healthcare NLP and proposes directions for privacy-preserving federated NLP in medical contexts, accompanied by an open-source framework. Overall, the work advances understanding of how DP and FL interact in large-scale NLP models applied to sensitive mental-health data.

Abstract

Natural Language Processing (NLP) techniques can be applied to help with the diagnosis of medical conditions such as depression, using a collection of a person's utterances. Depression is a serious medical illness that can have adverse effects on how one feels, thinks, and acts, which can lead to emotional and physical problems. Due to the sensitive nature of such data, privacy measures need to be taken for handling and training models with such data. In this work, we study the effects that the application of Differential Privacy (DP) has, in both a centralized and a Federated Learning (FL) setup, on training contextualized language models (BERT, ALBERT, RoBERTa and DistilBERT). We offer insights on how to privately train NLP models and what architectures and setups provide more desirable privacy utility trade-offs. We envisage this work to be used in future healthcare and mental health studies to keep medical history private. Therefore, we provide an open-source implementation of this work.

Benchmarking Differential Privacy and Federated Learning for BERT Models

TL;DR

The paper benchmarks Differential Privacy and Federated Learning for BERT-family models on Twitter data related to depression and harassment, comparing centralized DP, FL, and DP-FL under IID and Non-IID data. It analyzes multiple architectures (BERT, RoBERTa, DistilBERT, ALBERT) across privacy budgets, revealing that smaller models are more robust to DP, and that Non-IID FL presents greater utility degradation. The study provides actionable insights into privacy-utility trade-offs for healthcare NLP and proposes directions for privacy-preserving federated NLP in medical contexts, accompanied by an open-source framework. Overall, the work advances understanding of how DP and FL interact in large-scale NLP models applied to sensitive mental-health data.

Abstract

Natural Language Processing (NLP) techniques can be applied to help with the diagnosis of medical conditions such as depression, using a collection of a person's utterances. Depression is a serious medical illness that can have adverse effects on how one feels, thinks, and acts, which can lead to emotional and physical problems. Due to the sensitive nature of such data, privacy measures need to be taken for handling and training models with such data. In this work, we study the effects that the application of Differential Privacy (DP) has, in both a centralized and a Federated Learning (FL) setup, on training contextualized language models (BERT, ALBERT, RoBERTa and DistilBERT). We offer insights on how to privately train NLP models and what architectures and setups provide more desirable privacy utility trade-offs. We envisage this work to be used in future healthcare and mental health studies to keep medical history private. Therefore, we provide an open-source implementation of this work.

Paper Structure

This paper contains 8 sections, 1 equation, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Pipeline of our benchmarking framework: We preprocess raw Twitter data, and then use it to run four sets of experiments comparing conventional training in a centralized setup, training with differential privacy in a centralized setup, training with federated learning in a distributed setup and finally, applying differential privacy to federated learning.