Continuous Output Personality Detection Models via Mixed Strategy Training
Rong Wang, Kun Sun
TL;DR
This work tackles the problem of predicting continuous personality trait scores from text, addressing the limitation of binary outputs in traditional models. It leverages the PANDORA dataset to fine-tune a RoBERTa-base model with a range of mixed training strategies, including MLP integration, hyperparameter optimization, data augmentation, and ensemble methods. The best performing model (M3) demonstrates superior regression and trait-level accuracy over baselines, illustrating the value of combining advanced training techniques to produce continuous Big Five outputs. The findings suggest wide applicability in AI, psychology, HR, marketing, and healthcare, enabling nuanced personality assessments from language data with higher precision than binary approaches. The work also provides a practical benchmark showing continuous-output approaches can outperform traditional MBTI/essay-based binary models on established datasets.
Abstract
The traditional personality models only yield binary results. This paper presents a novel approach for training personality detection models that produce continuous output values, using mixed strategies. By leveraging the PANDORA dataset, which includes extensive personality labeling of Reddit comments, we developed models that predict the Big Five personality traits with high accuracy. Our approach involves fine-tuning a RoBERTa-base model with various strategies such as Multi-Layer Perceptron (MLP) integration, and hyperparameter tuning. The results demonstrate that our models significantly outperform traditional binary classification methods, offering precise continuous outputs for personality traits, thus enhancing applications in AI, psychology, human resources, marketing and health care fields.
