Table of Contents
Fetching ...

EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety

Jiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang, Mengdi Wang

TL;DR

EmoAgent tackles safety risks in human‑AI mental health interactions by coupling EmoEval, a virtual‑patient evaluation pipeline that uses CCD‑based cognitive models and validated instruments (PHQ‑9, PDI, PANSS), with EmoGuard, a real‑time safeguard that monitors users and guides dialogue. The framework reveals that emotionally engaging, character‑based agents can cause deterioration in vulnerable users in a substantial fraction of simulations, while EmoGuard significantly reduces such risk through iterative, in‑conversation interventions. Across multiple character personas and styles, EmoEval quantifies risk patterns and identifies common deterioration drivers, providing actionable guidance for safer design. The work demonstrates a practical path toward safer AI‑human interactions in mental health contexts, and the authors provide code for replication and further validation.

Abstract

The rise of LLM-driven AI characters raises safety concerns, particularly for vulnerable human users with psychological disorders. To address these risks, we propose EmoAgent, a multi-agent AI framework designed to evaluate and mitigate mental health hazards in human-AI interactions. EmoAgent comprises two components: EmoEval simulates virtual users, including those portraying mentally vulnerable individuals, to assess mental health changes before and after interactions with AI characters. It uses clinically proven psychological and psychiatric assessment tools (PHQ-9, PDI, PANSS) to evaluate mental risks induced by LLM. EmoGuard serves as an intermediary, monitoring users' mental status, predicting potential harm, and providing corrective feedback to mitigate risks. Experiments conducted in popular character-based chatbots show that emotionally engaging dialogues can lead to psychological deterioration in vulnerable users, with mental state deterioration in more than 34.4% of the simulations. EmoGuard significantly reduces these deterioration rates, underscoring its role in ensuring safer AI-human interactions. Our code is available at: https://github.com/1akaman/EmoAgent

EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety

TL;DR

EmoAgent tackles safety risks in human‑AI mental health interactions by coupling EmoEval, a virtual‑patient evaluation pipeline that uses CCD‑based cognitive models and validated instruments (PHQ‑9, PDI, PANSS), with EmoGuard, a real‑time safeguard that monitors users and guides dialogue. The framework reveals that emotionally engaging, character‑based agents can cause deterioration in vulnerable users in a substantial fraction of simulations, while EmoGuard significantly reduces such risk through iterative, in‑conversation interventions. Across multiple character personas and styles, EmoEval quantifies risk patterns and identifies common deterioration drivers, providing actionable guidance for safer design. The work demonstrates a practical path toward safer AI‑human interactions in mental health contexts, and the authors provide code for replication and further validation.

Abstract

The rise of LLM-driven AI characters raises safety concerns, particularly for vulnerable human users with psychological disorders. To address these risks, we propose EmoAgent, a multi-agent AI framework designed to evaluate and mitigate mental health hazards in human-AI interactions. EmoAgent comprises two components: EmoEval simulates virtual users, including those portraying mentally vulnerable individuals, to assess mental health changes before and after interactions with AI characters. It uses clinically proven psychological and psychiatric assessment tools (PHQ-9, PDI, PANSS) to evaluate mental risks induced by LLM. EmoGuard serves as an intermediary, monitoring users' mental status, predicting potential harm, and providing corrective feedback to mitigate risks. Experiments conducted in popular character-based chatbots show that emotionally engaging dialogues can lead to psychological deterioration in vulnerable users, with mental state deterioration in more than 34.4% of the simulations. EmoGuard significantly reduces these deterioration rates, underscoring its role in ensuring safer AI-human interactions. Our code is available at: https://github.com/1akaman/EmoAgent

Paper Structure

This paper contains 68 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of EmoAgent Framework for Human-AI Interaction. EmoAgent, which consists of two main components: EmoEval and EmoGuard, helps guide human-AI interaction, evaluating users' psychological conditions and providing advisory responses. EmoEval assesses psychological states such as depression, delusion, and psychosis, while EmoGuard mitigates mental risks by providing advice regarding emotion, thought, and dialogue through iterative training on analysis from EmoEval and chat history.
  • Figure 2: Overview of EmoEval for Evaluating Mental Safety of AI-human Interactions. The simulation consists of four steps: (1) User Agent Initialization & Initial Test, where a cognitive model and an LLM initialize the user agent, followed by an initial mental health test; (2) Chats with Character-based Agent, where the user agent engages in conversations with a character-based agent portrayed by the tested LLM, while a dialog manager verifies the validity of interactions and refines responses if necessary; (3) Final Test, where the user agent completes a final mental health test; and (4) Data Processing & Analysis, where initial and final mental health test results are processed and analyzed, chat histories of cases where depression deepening occurs are examined to identify contributing factors, and a Safeguard agent uses the insights for iterative improvement.
  • Figure 3: Overview of EmoGuard for Safeguarding Human-AI Interactions. Every fixed number of rounds of conversation, three components of the Safeguard Agent, the Emotion Watcher, Thought Refiner, and Dialog Guide, collaboratively analyze the chat with the latest profile. The Manager of the Safeguard Agent then synthesizes their outputs and provides advice to the character-based agent. After the conversation, the user agent undergoes a mental health assessment. If the mental health condition deteriorates over a threshold, the chat history is analyzed to identify potential causes by the Update System. With all historical profiles and potential causes, the Update System further improves the profile of the safeguard agent, completing the iterative training process.
  • Figure 4: An Example Conversation of Dialog Manager Guiding Conversation Topics and Exposing Jailbreak Risks. Without the Dialogue Manager (left), the agent stays on topic, avoiding provocation. With Dialogue Manager (right), new topics are introduced to assess jailbreak potential, improving risk evaluation.
  • Figure 5: Distribution of psychological test scores before (blue) and after (red) conversations with character-based agents, under two interaction styles: Meow (top) and Roar (bottom). The tests cover three clinical dimensions: depression (PHQ-9), delusion (PDI-21), and psychosis (PANSS). Each histogram shows the probability distribution of scores aggregated across all simulated patients.
  • ...and 4 more figures