Table of Contents
Fetching ...

A Federated Learning Approach to Privacy Preserving Offensive Language Identification

Marcos Zampieri, Damith Premasiri, Tharindu Ranasinghe

TL;DR

Offensive language identification often relies on centralized data, raising privacy concerns. The paper proposes a privacy-preserving federated learning framework that uses model fusion to combine multiple dataset-specific transformer models, followed by targeted fine-tuning, to detect offensive content without sharing raw data. Experiments on four English benchmarks (AHSD, HASOC, HateXplain, OLID) show that fused models consistently outperform ensemble baselines and generalize across datasets, with initial multilingual results using XLM-R indicating cross-lingual applicability. The findings support a practical, privacy-preserving path for offensive language detection across diverse data sources and languages, and point to future work exploring other FL architectures and large language models in FL settings.

Abstract

The spread of various forms of offensive speech online is an important concern in social media. While platforms have been investing heavily in ways of coping with this problem, the question of privacy remains largely unaddressed. Models trained to detect offensive language on social media are trained and/or fine-tuned using large amounts of data often stored in centralized servers. Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification. FL is a decentralized architecture that allows multiple models to be trained locally without the need for data sharing hence preserving users' privacy. We propose a model fusion approach to perform FL. We trained multiple deep learning models on four publicly available English benchmark datasets (AHSD, HASOC, HateXplain, OLID) and evaluated their performance in detail. We also present initial cross-lingual experiments in English and Spanish. We show that the proposed model fusion approach outperforms baselines in all the datasets while preserving privacy.

A Federated Learning Approach to Privacy Preserving Offensive Language Identification

TL;DR

Offensive language identification often relies on centralized data, raising privacy concerns. The paper proposes a privacy-preserving federated learning framework that uses model fusion to combine multiple dataset-specific transformer models, followed by targeted fine-tuning, to detect offensive content without sharing raw data. Experiments on four English benchmarks (AHSD, HASOC, HateXplain, OLID) show that fused models consistently outperform ensemble baselines and generalize across datasets, with initial multilingual results using XLM-R indicating cross-lingual applicability. The findings support a practical, privacy-preserving path for offensive language detection across diverse data sources and languages, and point to future work exploring other FL architectures and large language models in FL settings.

Abstract

The spread of various forms of offensive speech online is an important concern in social media. While platforms have been investing heavily in ways of coping with this problem, the question of privacy remains largely unaddressed. Models trained to detect offensive language on social media are trained and/or fine-tuned using large amounts of data often stored in centralized servers. Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification. FL is a decentralized architecture that allows multiple models to be trained locally without the need for data sharing hence preserving users' privacy. We propose a model fusion approach to perform FL. We trained multiple deep learning models on four publicly available English benchmark datasets (AHSD, HASOC, HateXplain, OLID) and evaluated their performance in detail. We also present initial cross-lingual experiments in English and Spanish. We show that the proposed model fusion approach outperforms baselines in all the datasets while preserving privacy.
Paper Structure (9 sections, 2 equations, 2 figures, 4 tables)

This paper contains 9 sections, 2 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The three stages of the FL pipeline in the proposed fused model.
  • Figure 2: A sample transformer model for offensive language identification ranasinghe-etal-2020-multilingual predicting offensive and not offensive labels.