Table of Contents
Fetching ...

The Company You Keep: How LLMs Respond to Dark Triad Traits

Zeyi Lu, Angelica Henestrosa, Pavel Chizhov, Ivan P. Yamshchikov

TL;DR

This study examines how LLMs respond to user prompts expressing varying degrees of Dark Triad traits using a curated dataset, revealing differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases.

Abstract

Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset. Our analysis reveals differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases. Model behavior also depends on the severity level and differs in the sentiment of the response. Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.

The Company You Keep: How LLMs Respond to Dark Triad Traits

TL;DR

This study examines how LLMs respond to user prompts expressing varying degrees of Dark Triad traits using a curated dataset, revealing differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases.

Abstract

Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset. Our analysis reveals differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases. Model behavior also depends on the severity level and differs in the sentiment of the response. Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.
Paper Structure (37 sections, 11 figures, 11 tables)

This paper contains 37 sections, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Classification distribution across Dark Triad traits for four models. Commercial models (Claude, GPT-5) maintain less than 2% of reinforcement across all traits; open-source models show 3--15%. Qwen exhibits 14.75% of reinforcement for Machiavellianism, the highest trait-model failure rate. See Table \ref{['tab:app_trait_model']} in in Appendix \ref{['app:results_tables']} for exact values.
  • Figure 2: Classification distribution across severity levels (Low, Medium, High) for four models. Reverse severity gradient emerges: Low severity shows 9.38% reinforcement vs. 0% at High severity. See Table \ref{['tab:app_severity_model']} in Appendix \ref{['app:results_tables']} for exact values.
  • Figure 3: Classification distribution across five contextual settings for four models. Claude maintains 0% reinforcement across all contexts. Open-source models show 2--3× variation, with Qwen highest in Workplace (11.90%) and Personal-Family (11.11%). See Table \ref{['tab:app_context_model']} in Appendix \ref{['app:results_tables']} for exact values.
  • Figure 4: Emotion intensity (caring, disapproval, approval, annoyance) in CORRECTIVE responses across models. See Table \ref{['tab:app_emotion']} in Appendix \ref{['app:results_tables']} for exact values and Appendix \ref{['app:emotions']} for an extended analysis of emotions.
  • Figure 5: Generation template used to expand SD3 behavioral descriptions into user prompts.
  • ...and 6 more figures