Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

Bohan Li; Jiannan Guan; Longxu Dou; Yunlong Feng; Dingzirui Wang; Yang Xu; Enbo Wang; Qiguang Chen; Bichen Wang; Xiao Xu; Yimeng Zhang; Libo Qin; Yanyan Zhao; Qingfu Zhu; Wanxiang Che

Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

Bohan Li, Jiannan Guan, Longxu Dou, Yunlong Feng, Dingzirui Wang, Yang Xu, Enbo Wang, Qiguang Chen, Bichen Wang, Xiao Xu, Yimeng Zhang, Libo Qin, Yanyan Zhao, Qingfu Zhu, Wanxiang Che

TL;DR

This paper argues that existing MBTI datasets overstate accuracy by relying on self-reported binary labels that misrepresent population trait distributions. It introduces MbtiBench, the first MBTI dataset with manually annotated soft labels guided by psychologists, combining data-filtering guidelines, rigorous annotation, and an EM-based Dawid-Skene approach to estimate per-dimension soft labels. The work demonstrates that soft labels better align with population traits and reveals polarized predictions and biases in LLMs, while showing that simple baselines and zero-shot prompting can outperform some advanced configurations. By providing a population-aware, soft-label MBTI resource and a thorough analysis of labeling practices and model behavior, the study offers practical pathways for more nuanced psychological tasks and downstream applications.

Abstract

The Myers-Briggs Type Indicator (MBTI) is one of the most influential personality theories reflecting individual differences in thinking, feeling, and behaving. MBTI personality detection has garnered considerable research interest and has evolved significantly over the years. However, this task tends to be overly optimistic, as it currently does not align well with the natural distribution of population personality traits. Specifically, (1) the self-reported labels in existing datasets result in incorrect labeling issues, and (2) the hard labels fail to capture the full range of population personality distributions. In this paper, we optimize the task by constructing MBTIBench, the first manually annotated high-quality MBTI personality detection dataset with soft labels, under the guidance of psychologists. As for the first challenge, MBTIBench effectively solves the incorrect labeling issues, which account for 29.58% of the data. As for the second challenge, we estimate soft labels by deriving the polarity tendency of samples. The obtained soft labels confirm that there are more people with non-extreme personality traits. Experimental results not only highlight the polarized predictions and biases in LLMs as key directions for future research, but also confirm that soft labels can provide more benefits to other psychological tasks than hard labels. The code and data are available at https://github.com/Personality-NLP/MbtiBench.

Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

TL;DR

Abstract

Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)