Table of Contents
Fetching ...

SugarTextNet: A Transformer-Based Framework for Detecting Sugar Dating-Related Content on Social Media with Context-Aware Focal Loss

Lionel Z. Wang, Shihan Ben, Yulu Huang, Simeng Qin

TL;DR

SugarTextNet addresses the challenge of detecting sugar dating-related content on social media, a task hindered by euphemistic language and severe class imbalance. The authors propose a transformer-based framework that fuses a RoBERTa-backed encoder, an attention-based cue extractor, and a Bi-GRU contextual encoder, together with Context-Aware Focal Loss to boost minority-class recall. On a curated Sina Weibo dataset of 3,067 Chinese posts, SugarTextNet substantially outperforms traditional ML/DL baselines and large language models, with ablations confirming the necessity of each component. The work highlights the value of domain-specific, context-aware modeling for sensitive content moderation and offers practical guidance for deployment and risk mitigation in real-world settings.

Abstract

Sugar dating-related content has rapidly proliferated on mainstream social media platforms, giving rise to serious societal and regulatory concerns, including commercialization of intimate relationships and the normalization of transactional relationships.~Detecting such content is highly challenging due to the prevalence of subtle euphemisms, ambiguous linguistic cues, and extreme class imbalance in real-world data.~In this work, we present SugarTextNet, a novel transformer-based framework specifically designed to identify sugar dating-related posts on social media.~SugarTextNet integrates a pretrained transformer encoder, an attention-based cue extractor, and a contextual phrase encoder to capture both salient and nuanced features in user-generated text.~To address class imbalance and enhance minority-class detection, we introduce Context-Aware Focal Loss, a tailored loss function that combines focal loss scaling with contextual weighting.~We evaluate SugarTextNet on a newly curated, manually annotated dataset of 3,067 Chinese social media posts from Sina Weibo, demonstrating that our approach substantially outperforms traditional machine learning models, deep learning baselines, and large language models across multiple metrics.~Comprehensive ablation studies confirm the indispensable role of each component.~Our findings highlight the importance of domain-specific, context-aware modeling for sensitive content detection, and provide a robust solution for content moderation in complex, real-world scenarios.

SugarTextNet: A Transformer-Based Framework for Detecting Sugar Dating-Related Content on Social Media with Context-Aware Focal Loss

TL;DR

SugarTextNet addresses the challenge of detecting sugar dating-related content on social media, a task hindered by euphemistic language and severe class imbalance. The authors propose a transformer-based framework that fuses a RoBERTa-backed encoder, an attention-based cue extractor, and a Bi-GRU contextual encoder, together with Context-Aware Focal Loss to boost minority-class recall. On a curated Sina Weibo dataset of 3,067 Chinese posts, SugarTextNet substantially outperforms traditional ML/DL baselines and large language models, with ablations confirming the necessity of each component. The work highlights the value of domain-specific, context-aware modeling for sensitive content moderation and offers practical guidance for deployment and risk mitigation in real-world settings.

Abstract

Sugar dating-related content has rapidly proliferated on mainstream social media platforms, giving rise to serious societal and regulatory concerns, including commercialization of intimate relationships and the normalization of transactional relationships.~Detecting such content is highly challenging due to the prevalence of subtle euphemisms, ambiguous linguistic cues, and extreme class imbalance in real-world data.~In this work, we present SugarTextNet, a novel transformer-based framework specifically designed to identify sugar dating-related posts on social media.~SugarTextNet integrates a pretrained transformer encoder, an attention-based cue extractor, and a contextual phrase encoder to capture both salient and nuanced features in user-generated text.~To address class imbalance and enhance minority-class detection, we introduce Context-Aware Focal Loss, a tailored loss function that combines focal loss scaling with contextual weighting.~We evaluate SugarTextNet on a newly curated, manually annotated dataset of 3,067 Chinese social media posts from Sina Weibo, demonstrating that our approach substantially outperforms traditional machine learning models, deep learning baselines, and large language models across multiple metrics.~Comprehensive ablation studies confirm the indispensable role of each component.~Our findings highlight the importance of domain-specific, context-aware modeling for sensitive content detection, and provide a robust solution for content moderation in complex, real-world scenarios.

Paper Structure

This paper contains 38 sections, 13 equations, 4 tables.