Table of Contents
Fetching ...

BIG5-TPoT: Predicting BIG Five Personality Traits, Facets, and Items Through Targeted Preselection of Texts

Triet M. Le, Arjun Chandra, C. Anton Rytting, Valerie P. Karuzis, Vladimir Rife, William A. Simpson

TL;DR

TPoT addresses the challenge of predicting BIG5 personality traits from text when input volume is large and LLM token limits are restrictive. The method semantically preselects text by computing sentence-level embeddings and retaining those most related to a target trait, facet, or item via cosine similarity, then aggregates them into a document embedding for regression or ordinal prediction. Across trait, facet, and item prediction, the BIG5-TPoT framework demonstrates consistent MAE reductions and higher accuracy than baseline regression and a standard M1 approach, with item-level ordinal regression offering the strongest gains. The approach is transferable to other text domains and offers a practical route to leveraging long-form text while respecting LLM input constraints.

Abstract

Predicting an individual's personalities from their generated texts is a challenging task, especially when the text volume is large. In this paper, we introduce a straightforward yet effective novel strategy called targeted preselection of texts (TPoT). This method semantically filters the texts as input to a deep learning model, specifically designed to predict a Big Five personality trait, facet, or item, referred to as the BIG5-TPoT model. By selecting texts that are semantically relevant to a particular trait, facet, or item, this strategy not only addresses the issue of input text limits in large language models but also improves the Mean Absolute Error and accuracy metrics in predictions for the Stream of Consciousness Essays dataset.

BIG5-TPoT: Predicting BIG Five Personality Traits, Facets, and Items Through Targeted Preselection of Texts

TL;DR

TPoT addresses the challenge of predicting BIG5 personality traits from text when input volume is large and LLM token limits are restrictive. The method semantically preselects text by computing sentence-level embeddings and retaining those most related to a target trait, facet, or item via cosine similarity, then aggregates them into a document embedding for regression or ordinal prediction. Across trait, facet, and item prediction, the BIG5-TPoT framework demonstrates consistent MAE reductions and higher accuracy than baseline regression and a standard M1 approach, with item-level ordinal regression offering the strongest gains. The approach is transferable to other text domains and offers a practical route to leveraging long-form text while respecting LLM input constraints.

Abstract

Predicting an individual's personalities from their generated texts is a challenging task, especially when the text volume is large. In this paper, we introduce a straightforward yet effective novel strategy called targeted preselection of texts (TPoT). This method semantically filters the texts as input to a deep learning model, specifically designed to predict a Big Five personality trait, facet, or item, referred to as the BIG5-TPoT model. By selecting texts that are semantically relevant to a particular trait, facet, or item, this strategy not only addresses the issue of input text limits in large language models but also improves the Mean Absolute Error and accuracy metrics in predictions for the Stream of Consciousness Essays dataset.

Paper Structure

This paper contains 6 sections, 4 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Trait, facet, and item regression by finetuning a pretrained BERT or by feature-based (no-finetuning).
  • Figure 2: Trait, facet, and item regression with TPoT by finetuning a pretrained BERT or by feature-based (no-finetuning).
  • Figure 3: Plots of the one-dimensional Logistic distribution and its cumulative function with $\mu = 2.6$ and $s = 0.3$.