Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality

Yiwen Xu; Qinyang Hou; Hongyu Wan; Mirjana Prpa

Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality

Yiwen Xu, Qinyang Hou, Hongyu Wan, Mirjana Prpa

TL;DR

Safe Guard tackles real-time voice-based hate speech detection in social VR by fusing GPT-3.5-based text analysis with a CNN that processes audio features (RMS and 40-dimensional MFCCs). The embodied agent operates in conversational and observational modes within VRChat, employing prompting strategies (direct, definition-based, and few-shot) and a strict multi-modal decision rule requiring both modalities to indicate hate for action, thereby reducing false positives. Using the HateMM dataset and VRChat experiments, the study shows that the combined approach achieves high precision (≈0.96) and substantially lowers false positives (FPR ≈0.01), with an average latency around 1.5 seconds, highlighting the practicality of real-time, multimodal moderation in immersive environments. This work demonstrates the viability and value of integrating LLM-driven moderation with audio cues to enhance safety in social VR and provides a foundation for future multimodal safety systems.

Abstract

In this paper, we present Safe Guard, an LLM-agent for the detection of hate speech in voice-based interactions in social VR (VRChat). Our system leverages Open AI GPT and audio feature extraction for real-time voice interactions. We contribute a system design and evaluation of the system that demonstrates the capability of our approach in detecting hate speech, and reducing false positives compared to currently available approaches. Our results indicate the potential of LLM-based agents in creating safer virtual environments and set the groundwork for further advancements in LLM-driven moderation approaches.

Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality

TL;DR

Abstract

Paper Structure (36 sections, 5 equations, 5 figures, 3 tables)

This paper contains 36 sections, 5 equations, 5 figures, 3 tables.

Introduction
Literature Review
Defining Harassment and Hate Speech in the Context of Social VR
Moderation of Harassment in Social VR
LLMs for RT Voice-based Hate Speech Detection
Improving the Accuracy of LLM Using Audio Feature Analysis for Hate Speech Detection
Methodology
Hate Speech Training and Testing Datasets
LLM Set Up and Prompt with Hate Speech Moderation Rules
Convolutional Neural Network Audio Feature Model
System Design
Safe Guard Agent Design
Hate Speech Detection System
Prompt Engineering for LLM Model
Approach #1: Direct Prompting
...and 21 more sections

Figures (5)

Figure 1: Safe Guard System Design in Conversational Mode
Figure 2: GPT Model Alone
Figure 3: Audio Feature Model Alone
Figure 4: Combined Model
Figure 5: Overall Latency Distribution

Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality

TL;DR

Abstract

Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality

Authors

TL;DR

Abstract

Table of Contents

Figures (5)