Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
Eswari Jayakumar, Niladri Sekhar Dash, Debasmita Mukherjee
TL;DR
This paper tackles the challenge of assessing prompted personality in LLM-based agents during poetry explanation tasks. It couples LangChain/RAG-based agent design with a Bloom-inspired, linguistically grounded question bank and evaluates responses via a triad of methods: a transformer-based personality predictor, a Judge LLM, and human linguistic experts. Findings reveal limitations and biases in purely data-driven evaluation, underscoring the need for interdisciplinary design and psychometric validation to reliably infer agent personality. The proposed framework offers a robust approach to designing and validating personality-aware LLM agents for interactive NLP systems.
Abstract
While Large Language Model (LLM)-based agents can be used to create highly engaging interactive applications through prompting personality traits and contextual data, effectively assessing their personalities has proven challenging. This novel interdisciplinary approach addresses this gap by combining agent development and linguistic analysis to assess the prompted personality of LLM-based agents in a poetry explanation task. We developed a novel, flexible question bank, informed by linguistic assessment criteria and human cognitive learning levels, offering a more comprehensive evaluation than current methods. By evaluating agent responses with natural language processing models, other LLMs, and human experts, our findings illustrate the limitations of purely deep learning solutions and emphasize the critical role of interdisciplinary design in agent development.
