Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts
Yifan Lyu, Liang Zhang
TL;DR
The paper addresses MBTI-style personality detection from social media by overcoming label scarcity and weak semantic mappings through ROME, a framework that uses LLM role-play to generate questionnaire-level evidence offline and a question-conditioned Mixture-of-Experts to learn item-level answers. This evidence is fused with post-derived representations in a multi-task setup, providing interpretable, psychology-grounded cues that improve predictions. Empirical results on Kaggle and Pandora show substantial gains over state-of-the-art baselines (up to 15.41% relative improvement) and demonstrate data efficiency under limited supervision. The work also provides interpretability analyses via case studies and routing patterns, highlighting the semantic alignment between questionnaire items, posts, and personality predictions.
Abstract
Understanding human personality is crucial for web applications such as personalized recommendation and mental health assessment. Existing studies on personality detection predominantly adopt a "posts -> user vector -> labels" modeling paradigm, which encodes social media posts into user representations for predicting personality labels (e.g., MBTI labels). While recent advances in large language models (LLMs) have improved text encoding capacities, these approaches remain constrained by limited supervision signals due to label scarcity, and under-specified semantic mappings between user language and abstract psychological constructs. We address these challenges by proposing ROME, a novel framework that explicitly injects psychological knowledge into personality detection. Inspired by standardized self-assessment tests, ROME leverages LLMs' role-play capability to simulate user responses to validated psychometric questionnaires. These generated question-level answers transform free-form user posts into interpretable, questionnaire-grounded evidence linking linguistic cues to personality labels, thereby providing rich intermediate supervision to mitigate label scarcity while offering a semantic reasoning chain that guides and simplifies the text-to-personality mapping learning. A question-conditioned Mixture-of-Experts module then jointly routes over post and question representations, learning to answer questionnaire items under explicit supervision. The predicted answers are summarized into an interpretable answer vector and fused with the user representation for final prediction within a multi-task learning framework, where question answering serves as a powerful auxiliary task for personality detection. Extensive experiments on two real-world datasets demonstrate that ROME consistently outperforms state-of-the-art baselines, achieving improvements (15.41% on Kaggle dataset).
