Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques
Ivory Yang, Xiaobo Guo, Sean Xie, Soroush Vosoughi
TL;DR
The paper addresses detecting dialogical mental manipulation with prompting techniques on large language models. It systematically compares Zero-Shot, Few-Shot, and Chain-of-Thought prompting (both with and without examples) on the MentalManipCon dataset using GPT-3.5 and GPT-4o, evaluating precision, recall, and F1 metrics. A key finding is that Few-Shot CoT achieves the highest accuracy, while CoT without examples can underperform, particularly for more capable models, with notable biases such as overemphasizing verbal cues. The work contributes insights into prompt design for manipulation detection and outlines future directions, including iterative and self-consistent prompting and bias-aware analysis in a long-term research agenda.
Abstract
This study presents a comprehensive, long-term project to explore the effectiveness of various prompting techniques in detecting dialogical mental manipulation. We implement Chain-of-Thought prompting with Zero-Shot and Few-Shot settings on a binary mental manipulation detection task, building upon existing work conducted with Zero-Shot and Few- Shot prompting. Our primary objective is to decipher why certain prompting techniques display superior performance, so as to craft a novel framework tailored for detection of mental manipulation. Preliminary findings suggest that advanced prompting techniques may not be suitable for more complex models, if they are not trained through example-based learning.
