Table of Contents
Fetching ...

Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity

Johannes Schneider, Arianna Casanova Flores, Anne-Catherine Kranz

TL;DR

This work analyzes unconstrained human-LLM conversations from the LMSYS-Chat-1M corpus to understand how users conceptualize AI partners and where toxicity originates. By combining dictionary-based politeness metrics, reading-ease measures, and a moderation API with manual coding, the authors show that toxicity is predominantly triggered by humans and that users shift from machine- to human-like mental models as dialogues unfold. The study identifies clear patterns in politeness, personalization, and prompt length that accompany this shift, and it characterizes the dynamics and triggers of toxic content, including roleplay contexts and misclassification issues. The findings have implications for AI governance, design of user interfaces, and regulatory approaches by illustrating that user behavior and expectations shape both dialogue quality and toxicity risk.

Abstract

This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings in contrast to most prior research focusing on ethically trimmed models like ChatGPT for specific tasks. We aim to understand the originator of toxicity. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content. Our manual analysis of hundreds of conversations judged as toxic by APIs commercial vendors, also raises questions with respect to current practices of what user requests are refused to answer. Furthermore, we conjecture based on multiple empirical indicators that humans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.

Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity

TL;DR

This work analyzes unconstrained human-LLM conversations from the LMSYS-Chat-1M corpus to understand how users conceptualize AI partners and where toxicity originates. By combining dictionary-based politeness metrics, reading-ease measures, and a moderation API with manual coding, the authors show that toxicity is predominantly triggered by humans and that users shift from machine- to human-like mental models as dialogues unfold. The study identifies clear patterns in politeness, personalization, and prompt length that accompany this shift, and it characterizes the dynamics and triggers of toxic content, including roleplay contexts and misclassification issues. The findings have implications for AI governance, design of user interfaces, and regulatory approaches by illustrating that user behavior and expectations shape both dialogue quality and toxicity risk.

Abstract

This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings in contrast to most prior research focusing on ethically trimmed models like ChatGPT for specific tasks. We aim to understand the originator of toxicity. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content. Our manual analysis of hundreds of conversations judged as toxic by APIs commercial vendors, also raises questions with respect to current practices of what user requests are refused to answer. Furthermore, we conjecture based on multiple empirical indicators that humans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.
Paper Structure (14 sections, 14 figures, 1 table)

This paper contains 14 sections, 14 figures, 1 table.

Figures (14)

  • Figure 1: Commercial models like OpenAI's GPT4o tend to prefer denying users prompts' in favor of mitigating risk of toxic responses
  • Figure 2: Example where a human went from communication style typical for interacting with a machine towards one more prevalent for humans as indicated by the use of politeness and shorter prompts within a single conversation.
  • Figure 3: Frequency of toxic turns with score > 0.25 and their percentage of all turns (including non-toxic)
  • Figure 4: Conjectured change of mental model during conversation: After the initial prompt, humans tend to shift from a mental model typical for machine-interation to one typical for human-interaction.
  • Figure 5: Asking for requests politely using 'please'
  • ...and 9 more figures