Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming
Xinyi Zhang, Pamela J. Wisniewski, Jin-hee Cho, Lifu Huang, Sang Won Lee
TL;DR
The paper addresses cybergrooming risks by highlighting gaps in detection-based approaches and outdated datasets, proposing a crowdsourced data collection framework that engages both parents and adolescents in simulated cybergrooming scenarios. This data will train generative NLP models to create educational conversations and illuminate differences in risk perception and coping behaviors between the two groups. By analyzing four data types (parents' vulnerable and resilient responses, adolescents' vulnerable and resilient responses) through qualitative methods, the work aims to develop robust educational agents and interfaces that empower youth while mitigating privacy and ethical concerns. The approach offers a practical path toward large-scale, authentic datasets for youth education on cybergrooming and informs the design of protective conversational tools and curricula.
Abstract
Cybergrooming emerges as a growing threat to adolescent safety and mental health. One way to combat cybergrooming is to leverage predictive artificial intelligence (AI) to detect predatory behaviors in social media. However, these methods can encounter challenges like false positives and negative implications such as privacy concerns. Another complementary strategy involves using generative artificial intelligence to empower adolescents by educating them about predatory behaviors. To this end, we envision developing state-of-the-art conversational agents to simulate the conversations between adolescents and predators for educational purposes. Yet, one key challenge is the lack of a dataset to train such conversational agents. In this position paper, we present our motivation for empowering adolescents to cope with cybergrooming. We propose to develop large-scale, authentic datasets through an online survey targeting adolescents and parents. We discuss some initial background behind our motivation and proposed design of the survey, such as situating the participants in artificial cybergrooming scenarios, then allowing participants to respond to the survey to obtain their authentic responses. We also present several open questions related to our proposed approach and hope to discuss them with the workshop attendees.
