Enhancing Large Language Models for Detecting Mental Manipulation via Annotation-Free Data Augmentation and Anti-Curriculum Distillation
Yuansheng Gao, Han Bao, Tong Zhang, Bin Li, Jixiang Luo, Ronghao Chen, Zonghui Wang, Wenzhi Chen
TL;DR
MentalMAC tackles the detection of mental manipulation in multi-turn dialogues by integrating annotation-free data augmentation (EvoSA), teacher-driven multi-task supervision, and task-level anti-curriculum distillation. It introduces ReaMent, a 5,000-real-world-dialogue dataset for robust evaluation. Across experiments, MentalMAC yields substantial gains over baselines and enables smaller models to approach or surpass large LLMs on this task, highlighting the practical potential of data-efficient, curriculum-aware training for covert manipulation detection.
Abstract
Mental manipulation is a subtle yet pervasive form of psychological abuse that poses serious threats to mental health. Nevertheless, detecting mental manipulation remains a largely underexplored research problem. The field faces three major challenges: (i) insufficient and hard-to-obtain training data; (ii) the covert nature of mental manipulation, which hinders detection; and (iii) the lack of real-world datasets. To address these challenges, we propose MentalMAC, a novel framework that enhances large language models' ability to detect elements of mental manipulation in multi-turn dialogue. Our approach consists of three key components: EvoSA, an annotation-free data augmentation method based on evolutionary operations and speech act theory; teacher-model-generated multi-task supervision; and progressive task-level anti-curriculum distillation. We then constructed the ReaMent dataset, comprising 5,000 real-world dialogue samples, utilizing MentalMAC-distilled models to aid in human annotation. Vast experiments show that MentalMAC achieves up to 25.9% improvement in F1mac and 8.1% in accuracy over the best-performing baseline, outperforming commercial LLMs such as GPT-4 and Claude-3.5-Sonnet. Warning: This paper contains content that may be offensive to the reader.
