Table of Contents
Fetching ...

SELF-PERCEPT: Introspection Improves Large Language Models' Detection of Multi-Person Mental Manipulation in Conversations

Danush Khanna, Pratinav Seth, Sidhaarth Sredharan Murali, Aditya Kumar Guru, Siddharth Shukla, Tanuj Tyagi, Sandeep Chaurasia, Kripabandhu Ghosh

TL;DR

This work tackles the problem of detecting mental manipulation in complex, multi-turn, multi-person conversations by introducing the MultiManip dataset and a Self-Perception Theory–inspired prompting framework, SELF-PERCEPT. The two-stage method first observes behavioral cues and then infers latent attitudes to identify manipulation with improved precision-recall balance across GPT-4o and Llama-3.1-8B, outperforming baseline prompting strategies on both MultiManip and MentalManip datasets. The dataset, collected from reality-show transcripts and annotated for 11 manipulation techniques, provides a balanced and challenging benchmark for evaluating multi-party manipulation detection. The study highlights the potential of behaviorally grounded prompt reasoning to enhance interpretability and effectiveness in real-world conversational safety applications, while noting limitations and the need for broader, diverse data and responsible deployment guidelines.

Abstract

Mental manipulation is a subtle yet pervasive form of abuse in interpersonal communication, making its detection critical for safeguarding potential victims. However, due to manipulation's nuanced and context-specific nature, identifying manipulative language in complex, multi-turn, and multi-person conversations remains a significant challenge for large language models (LLMs). To address this gap, we introduce the MultiManip dataset, comprising 220 multi-turn, multi-person dialogues balanced between manipulative and non-manipulative interactions, all drawn from reality shows that mimic real-world scenarios. For manipulative interactions, it includes 11 distinct manipulations depicting real-life scenarios. We conduct extensive evaluations of state-of-the-art LLMs, such as GPT-4o and Llama-3.1-8B, employing various prompting strategies. Despite their capabilities, these models often struggle to detect manipulation effectively. To overcome this limitation, we propose SELF-PERCEPT, a novel, two-stage prompting framework inspired by Self-Perception Theory, demonstrating strong performance in detecting multi-person, multi-turn mental manipulation. Our code and data are publicly available at https://github.com/danushkhanna/self-percept .

SELF-PERCEPT: Introspection Improves Large Language Models' Detection of Multi-Person Mental Manipulation in Conversations

TL;DR

This work tackles the problem of detecting mental manipulation in complex, multi-turn, multi-person conversations by introducing the MultiManip dataset and a Self-Perception Theory–inspired prompting framework, SELF-PERCEPT. The two-stage method first observes behavioral cues and then infers latent attitudes to identify manipulation with improved precision-recall balance across GPT-4o and Llama-3.1-8B, outperforming baseline prompting strategies on both MultiManip and MentalManip datasets. The dataset, collected from reality-show transcripts and annotated for 11 manipulation techniques, provides a balanced and challenging benchmark for evaluating multi-party manipulation detection. The study highlights the potential of behaviorally grounded prompt reasoning to enhance interpretability and effectiveness in real-world conversational safety applications, while noting limitations and the need for broader, diverse data and responsible deployment guidelines.

Abstract

Mental manipulation is a subtle yet pervasive form of abuse in interpersonal communication, making its detection critical for safeguarding potential victims. However, due to manipulation's nuanced and context-specific nature, identifying manipulative language in complex, multi-turn, and multi-person conversations remains a significant challenge for large language models (LLMs). To address this gap, we introduce the MultiManip dataset, comprising 220 multi-turn, multi-person dialogues balanced between manipulative and non-manipulative interactions, all drawn from reality shows that mimic real-world scenarios. For manipulative interactions, it includes 11 distinct manipulations depicting real-life scenarios. We conduct extensive evaluations of state-of-the-art LLMs, such as GPT-4o and Llama-3.1-8B, employing various prompting strategies. Despite their capabilities, these models often struggle to detect manipulation effectively. To overcome this limitation, we propose SELF-PERCEPT, a novel, two-stage prompting framework inspired by Self-Perception Theory, demonstrating strong performance in detecting multi-person, multi-turn mental manipulation. Our code and data are publicly available at https://github.com/danushkhanna/self-percept .

Paper Structure

This paper contains 37 sections, 5 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: An example of a manipulative conversation from the MultiManip Dataset, including illustrations of the Proposed SELF-PERCEPT Prompting Method and outputs from both SELF-PERCEPT and K-shot GPT-4o.
  • Figure 2: Taxonomy of various Mental Manipulation Techniques. Description of Techniques in Table \ref{['tab:appn-t1']}.
  • Figure 3: Top SHAP Contributions from SPT Stage 1 and CoT