BIG5-TPoT: Predicting BIG Five Personality Traits, Facets, and Items Through Targeted Preselection of Texts
Triet M. Le, Arjun Chandra, C. Anton Rytting, Valerie P. Karuzis, Vladimir Rife, William A. Simpson
TL;DR
TPoT addresses the challenge of predicting BIG5 personality traits from text when input volume is large and LLM token limits are restrictive. The method semantically preselects text by computing sentence-level embeddings and retaining those most related to a target trait, facet, or item via cosine similarity, then aggregates them into a document embedding for regression or ordinal prediction. Across trait, facet, and item prediction, the BIG5-TPoT framework demonstrates consistent MAE reductions and higher accuracy than baseline regression and a standard M1 approach, with item-level ordinal regression offering the strongest gains. The approach is transferable to other text domains and offers a practical route to leveraging long-form text while respecting LLM input constraints.
Abstract
Predicting an individual's personalities from their generated texts is a challenging task, especially when the text volume is large. In this paper, we introduce a straightforward yet effective novel strategy called targeted preselection of texts (TPoT). This method semantically filters the texts as input to a deep learning model, specifically designed to predict a Big Five personality trait, facet, or item, referred to as the BIG5-TPoT model. By selecting texts that are semantically relevant to a particular trait, facet, or item, this strategy not only addresses the issue of input text limits in large language models but also improves the Mean Absolute Error and accuracy metrics in predictions for the Stream of Consciousness Essays dataset.
