Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines

Yaochen Zhu; Xiangqing Shen; Rui Xia

Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines

Yaochen Zhu, Xiangqing Shen, Rui Xia

TL;DR

The paper defines Personality-aware Human-centric Multimodal Reasoning ($T^1$) and introduces the PHMRD dataset, built from six TV shows to forecast a specific individual's future behavior using past multimodal signals and personality traits. It demonstrates that integrating personality improves reasoning performance over baselines and presents a concrete architecture (PRM) leveraging Merlot Reserve encoders and trait embeddings. An extension task ($T^2$) with predicted personality (via MPPD) is proposed to address annotation gaps, and experiments show that predicted personalities can nearly match annotated performance, enabling practical deployment. The work contributes a new task, a large-scale dataset, baseline models, and an extension path for personality prediction, with public release planned for dataset and code.”

Abstract

Personality traits, emotions, and beliefs shape individuals' behavioral choices and decision-making processes. However, for one thing, the affective computing community normally focused on predicting personality traits but overlooks their application in behavior prediction. For another, the multimodal reasoning task emphasized the prediction of future states and behaviors but often neglected the incorporation of individual personality traits. In this work, we introduce a new task called Personality-aware Human-centric Multimodal Reasoning (PHMR) (T1), with the goal of forecasting the future behavior of a particular individual using multimodal information from past instances, while integrating personality factors. We accordingly construct a new dataset based on six television shows, encompassing 225 characters and 12k samples. To establish a benchmark for the task, we propose seven baseline methods: three adapted from related tasks, two pre-trained model, and two multimodal large language models. The experimental results demonstrate that incorporating personality traits enhances human-centric multimodal reasoning performance. To further solve the lack of personality annotation in real-life scenes, we introduce an extension task called Personality-predicted Human-centric Multimodal Reasoning task (T2) along with the corresponding dataset and method. We will make our dataset and code available on GitHub.

Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines

TL;DR

The paper defines Personality-aware Human-centric Multimodal Reasoning (

) and introduces the PHMRD dataset, built from six TV shows to forecast a specific individual's future behavior using past multimodal signals and personality traits. It demonstrates that integrating personality improves reasoning performance over baselines and presents a concrete architecture (PRM) leveraging Merlot Reserve encoders and trait embeddings. An extension task (

) with predicted personality (via MPPD) is proposed to address annotation gaps, and experiments show that predicted personalities can nearly match annotated performance, enabling practical deployment. The work contributes a new task, a large-scale dataset, baseline models, and an extension path for personality prediction, with public release planned for dataset and code.”

Abstract

Paper Structure (37 sections, 5 equations, 4 figures, 14 tables)

This paper contains 37 sections, 5 equations, 4 figures, 14 tables.

Introduction
Task
Dataset
Dataset Source
Dataset Construction Process
Data Filtering
Rewriting of Behavior Description
Personality Annotation
Dataset Statistics
Method
Multimodal Signal Representation
Personality-aware Multimodal Reasoning
Extension: Human-centric Multimodal Reasoning with Predicted Personality
Experiments
Experimental Settings
...and 22 more sections

Figures (4)

Figure 1: Illustrations of PHMR and PHMRD. Person $\mathit{0}$, $\mathit{1}$, and $\mathit{2}$ represent the characters appearing in the video clip. Our task is to predict the most plausible behavior description at a $T_n$ based on video, dialogue, audio and personalities information.
Figure 2: PRM$_{pretrain}$ model for PHMR.
Figure 3: PRM$_{pretrain}$ for Personality prediction and Personality-aware HMR.
Figure 4: three personalities.

Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines

TL;DR

Abstract

Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines

Authors

TL;DR

Abstract

Table of Contents

Figures (4)