Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu, Xiangqing Shen, Rui Xia
TL;DR
The paper defines Personality-aware Human-centric Multimodal Reasoning ($T^1$) and introduces the PHMRD dataset, built from six TV shows to forecast a specific individual's future behavior using past multimodal signals and personality traits. It demonstrates that integrating personality improves reasoning performance over baselines and presents a concrete architecture (PRM) leveraging Merlot Reserve encoders and trait embeddings. An extension task ($T^2$) with predicted personality (via MPPD) is proposed to address annotation gaps, and experiments show that predicted personalities can nearly match annotated performance, enabling practical deployment. The work contributes a new task, a large-scale dataset, baseline models, and an extension path for personality prediction, with public release planned for dataset and code.”
Abstract
Personality traits, emotions, and beliefs shape individuals' behavioral choices and decision-making processes. However, for one thing, the affective computing community normally focused on predicting personality traits but overlooks their application in behavior prediction. For another, the multimodal reasoning task emphasized the prediction of future states and behaviors but often neglected the incorporation of individual personality traits. In this work, we introduce a new task called Personality-aware Human-centric Multimodal Reasoning (PHMR) (T1), with the goal of forecasting the future behavior of a particular individual using multimodal information from past instances, while integrating personality factors. We accordingly construct a new dataset based on six television shows, encompassing 225 characters and 12k samples. To establish a benchmark for the task, we propose seven baseline methods: three adapted from related tasks, two pre-trained model, and two multimodal large language models. The experimental results demonstrate that incorporating personality traits enhances human-centric multimodal reasoning performance. To further solve the lack of personality annotation in real-life scenes, we introduce an extension task called Personality-predicted Human-centric Multimodal Reasoning task (T2) along with the corresponding dataset and method. We will make our dataset and code available on GitHub.
