Prompting-in-a-Series: Psychology-Informed Contents and Embeddings for Personality Recognition With Decoder-Only Models
Jing Jie Tan, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum, Anissa Mokraoui, Shih-Yu Lo
TL;DR
The paper tackles personality recognition from text by introducing PICEPR, a two-pipeline framework that modularizes decoder-only LLMs into Contents and Embeddings components. It defines five LLM roles (Summary, Mimic, Psycho, Classify, Vector) and uses structured prompts with CoT reasoning and JSON outputs to produce robust trait labels and embeddings. Across Essays and Kaggle datasets, PICEPR achieves state-of-the-art gains (5-15%) over regular prompting and several baselines, while analyzing bias, invalid outputs, and cost-efficiency. The study also assesses both decoder-only and encoder-only configurations, showing that modular prompting can rival or surpass fine-tuning, with Embeddings pipelines enabling effective data augmentation for better generalization. Limitations include dataset size, labeling subjectivity, and potential biases, suggesting future work on larger diverse datasets and bias-aware evaluation.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across various natural language processing tasks. This research introduces a novel "Prompting-in-a-Series" algorithm, termed PICEPR (Psychology-Informed Contents Embeddings for Personality Recognition), featuring two pipelines: (a) Contents and (b) Embeddings. The approach demonstrates how a modularised decoder-only LLM can summarize or generate content, which can aid in classifying or enhancing personality recognition functions as a personality feature extractor and a generator for personality-rich content. We conducted various experiments to provide evidence to justify the rationale behind the PICEPR algorithm. Meanwhile, we also explored closed-source models such as \textit{gpt4o} from OpenAI and \textit{gemini} from Google, along with open-source models like \textit{mistral} from Mistral AI, to compare the quality of the generated content. The PICEPR algorithm has achieved a new state-of-the-art performance for personality recognition by 5-15\% improvement. The work repository and models' weight can be found at https://research.jingjietan.com/?q=PICEPR.
