From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning

Yichao Feng; Haoran Luo; Lang Feng; Shuai Zhao; Anh Tuan Luu

From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning

Yichao Feng, Haoran Luo, Lang Feng, Shuai Zhao, Anh Tuan Luu

TL;DR

The paper tackles the challenge of psychologically grounded reasoning in LLMs, where implicit mental-state inference in complex narratives remains difficult. It introduces StimuliQA, an expert-annotated dataset of real-world psychological stimuli, and Psy-Interpreter, a bilateral reinforcement-learning framework with a trajectory cache and a multi-term reward (including $r^{\text{F1}}$, $r^{\text{fmt}}$, $r^{\text{BR}}$, and $r^{\text{rep}}$) to foster structured, theory-aligned reasoning. A Trajectory Cache and the Trajectory-aware GRPO objective $\mathcal{J}_{\text{T-GRPO}}(\theta)$ stabilize learning and promote generalization, while continual learning enables self-improvement from high-confidence predictions. Across six datasets, including ToMbench, SimpleToM, SocialIQa, CosmosQA, and BIG-Bench Hard, the approach yields significant gains in out-of-distribution performance, strong zero-shot generalization, and competitive results against larger models, demonstrating the value of expert supervision and reasoning-aware RL for efficient psychological reasoning in LLMs.

Abstract

Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.

From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning

TL;DR

Abstract

From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)