Table of Contents
Fetching ...

From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning

Yichao Feng, Haoran Luo, Lang Feng, Shuai Zhao, Anh Tuan Luu

TL;DR

The paper tackles the challenge of psychologically grounded reasoning in LLMs, where implicit mental-state inference in complex narratives remains difficult. It introduces StimuliQA, an expert-annotated dataset of real-world psychological stimuli, and Psy-Interpreter, a bilateral reinforcement-learning framework with a trajectory cache and a multi-term reward (including $r^{\text{F1}}$, $r^{\text{fmt}}$, $r^{\text{BR}}$, and $r^{\text{rep}}$) to foster structured, theory-aligned reasoning. A Trajectory Cache and the Trajectory-aware GRPO objective $\mathcal{J}_{\text{T-GRPO}}(\theta)$ stabilize learning and promote generalization, while continual learning enables self-improvement from high-confidence predictions. Across six datasets, including ToMbench, SimpleToM, SocialIQa, CosmosQA, and BIG-Bench Hard, the approach yields significant gains in out-of-distribution performance, strong zero-shot generalization, and competitive results against larger models, demonstrating the value of expert supervision and reasoning-aware RL for efficient psychological reasoning in LLMs.

Abstract

Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.

From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning

TL;DR

The paper tackles the challenge of psychologically grounded reasoning in LLMs, where implicit mental-state inference in complex narratives remains difficult. It introduces StimuliQA, an expert-annotated dataset of real-world psychological stimuli, and Psy-Interpreter, a bilateral reinforcement-learning framework with a trajectory cache and a multi-term reward (including , , , and ) to foster structured, theory-aligned reasoning. A Trajectory Cache and the Trajectory-aware GRPO objective stabilize learning and promote generalization, while continual learning enables self-improvement from high-confidence predictions. Across six datasets, including ToMbench, SimpleToM, SocialIQa, CosmosQA, and BIG-Bench Hard, the approach yields significant gains in out-of-distribution performance, strong zero-shot generalization, and competitive results against larger models, demonstrating the value of expert supervision and reasoning-aware RL for efficient psychological reasoning in LLMs.

Abstract

Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.

Paper Structure

This paper contains 55 sections, 15 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: A sample ToMbench question presents two types of psychological stimuli pairs for demonstration.
  • Figure 2: The figure summarizes key psychological parameters across our datasets: emotional variables, narrative transformations, and collective factors, comprising 35,084, 24,831, and 36,890 QA instances, each offering insights into human narratives.
  • Figure 3: The framework comprises: StimuliQA, stimuli with expert psychological labels; Psy-Interpreter, a training framework tailored for psychological tasks; and Continual Learning, demonstrating continual learning capability through self-labeling.
  • Figure 4: Comparison of GRPO training on StimuliQA and two other training datasets on Qwen0.5B and 3B. Full table with Qwen1.5B’s results is located in the Appendix.
  • Figure 5: Base and Bilateral reward Training comparison.
  • ...and 1 more figures