Table of Contents
Fetching ...

SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations

Xingwei Tan, Chen Lyu, Hafiz Muhammad Umer, Sahrish Khan, Mahathi Parvatham, Lois Arthurs, Simon Cullen, Shelley Wilson, Arshad Jhumka, Gabriele Pergola

TL;DR

SafeSpeech addresses the gap between message-level toxicity detection and conversation-level analysis by unifying fine-tuned classifiers and large language models (LLMs) to deliver multi-granularity detection, toxic-aware summarization, and persona profiling. It introduces a soft-explanation mechanism via perplexity gain, defined as $R(s_i,Y|X;\\theta)=PPL(Y|X_{\\setminus s_i};\\theta)-PPL(Y|X;\\theta)$, to identify linguistic contributors to predictions. The system is evaluated on benchmarks including EDOS, OffensEval, HatEval, and AbusEval, achieving state-of-the-art performance across sexism, hate, and abusive-subtasks and offering summarization and persona modules. SafeSpeech offers an interactive, prompt-driven interface designed to facilitate research and moderation while highlighting ethical and privacy considerations for deployment.

Abstract

Detecting toxic language including sexism, harassment and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce SafeSpeech, a comprehensive platform for toxic content detection and analysis that bridges message-level and conversation-level insights. The platform integrates fine-tuned classifiers and large language models (LLMs) to enable multi-granularity detection, toxic-aware conversation summarization, and persona profiling. SafeSpeech also incorporates explainability mechanisms, such as perplexity gain analysis, to highlight the linguistic elements driving predictions. Evaluations on benchmark datasets, including EDOS, OffensEval, and HatEval, demonstrate the reproduction of state-of-the-art performance across multiple tasks, including fine-grained sexism detection.

SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations

TL;DR

SafeSpeech addresses the gap between message-level toxicity detection and conversation-level analysis by unifying fine-tuned classifiers and large language models (LLMs) to deliver multi-granularity detection, toxic-aware summarization, and persona profiling. It introduces a soft-explanation mechanism via perplexity gain, defined as , to identify linguistic contributors to predictions. The system is evaluated on benchmarks including EDOS, OffensEval, HatEval, and AbusEval, achieving state-of-the-art performance across sexism, hate, and abusive-subtasks and offering summarization and persona modules. SafeSpeech offers an interactive, prompt-driven interface designed to facilitate research and moderation while highlighting ethical and privacy considerations for deployment.

Abstract

Detecting toxic language including sexism, harassment and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce SafeSpeech, a comprehensive platform for toxic content detection and analysis that bridges message-level and conversation-level insights. The platform integrates fine-tuned classifiers and large language models (LLMs) to enable multi-granularity detection, toxic-aware conversation summarization, and persona profiling. SafeSpeech also incorporates explainability mechanisms, such as perplexity gain analysis, to highlight the linguistic elements driving predictions. Evaluations on benchmark datasets, including EDOS, OffensEval, and HatEval, demonstrate the reproduction of state-of-the-art performance across multiple tasks, including fine-grained sexism detection.

Paper Structure

This paper contains 29 sections, 1 equation, 13 figures, 6 tables.

Figures (13)

  • Figure 1: The structure of SafeSpeech platform.
  • Figure 2: An example of perplexity gain analysis on a synthetically generated conversation. Based on the response, we computed perplexity gains and represented them as a heatmap.
  • Figure 3: An example of persona analysis generated by Llama3.1 based on the conversation summary and the Big Five personality traits framework.
  • Figure 4: An example conversation about harassment.
  • Figure 5: An example conversation involving controlling and coercive behaviours.
  • ...and 8 more figures