Table of Contents
Fetching ...

Continuous Output Personality Detection Models via Mixed Strategy Training

Rong Wang, Kun Sun

TL;DR

This work tackles the problem of predicting continuous personality trait scores from text, addressing the limitation of binary outputs in traditional models. It leverages the PANDORA dataset to fine-tune a RoBERTa-base model with a range of mixed training strategies, including MLP integration, hyperparameter optimization, data augmentation, and ensemble methods. The best performing model (M3) demonstrates superior regression and trait-level accuracy over baselines, illustrating the value of combining advanced training techniques to produce continuous Big Five outputs. The findings suggest wide applicability in AI, psychology, HR, marketing, and healthcare, enabling nuanced personality assessments from language data with higher precision than binary approaches. The work also provides a practical benchmark showing continuous-output approaches can outperform traditional MBTI/essay-based binary models on established datasets.

Abstract

The traditional personality models only yield binary results. This paper presents a novel approach for training personality detection models that produce continuous output values, using mixed strategies. By leveraging the PANDORA dataset, which includes extensive personality labeling of Reddit comments, we developed models that predict the Big Five personality traits with high accuracy. Our approach involves fine-tuning a RoBERTa-base model with various strategies such as Multi-Layer Perceptron (MLP) integration, and hyperparameter tuning. The results demonstrate that our models significantly outperform traditional binary classification methods, offering precise continuous outputs for personality traits, thus enhancing applications in AI, psychology, human resources, marketing and health care fields.

Continuous Output Personality Detection Models via Mixed Strategy Training

TL;DR

This work tackles the problem of predicting continuous personality trait scores from text, addressing the limitation of binary outputs in traditional models. It leverages the PANDORA dataset to fine-tune a RoBERTa-base model with a range of mixed training strategies, including MLP integration, hyperparameter optimization, data augmentation, and ensemble methods. The best performing model (M3) demonstrates superior regression and trait-level accuracy over baselines, illustrating the value of combining advanced training techniques to produce continuous Big Five outputs. The findings suggest wide applicability in AI, psychology, HR, marketing, and healthcare, enabling nuanced personality assessments from language data with higher precision than binary approaches. The work also provides a practical benchmark showing continuous-output approaches can outperform traditional MBTI/essay-based binary models on established datasets.

Abstract

The traditional personality models only yield binary results. This paper presents a novel approach for training personality detection models that produce continuous output values, using mixed strategies. By leveraging the PANDORA dataset, which includes extensive personality labeling of Reddit comments, we developed models that predict the Big Five personality traits with high accuracy. Our approach involves fine-tuning a RoBERTa-base model with various strategies such as Multi-Layer Perceptron (MLP) integration, and hyperparameter tuning. The results demonstrate that our models significantly outperform traditional binary classification methods, offering precise continuous outputs for personality traits, thus enhancing applications in AI, psychology, human resources, marketing and health care fields.
Paper Structure (7 sections, 1 figure, 8 tables)

This paper contains 7 sections, 1 figure, 8 tables.

Figures (1)

  • Figure 1: The roadmap of the present study