The Personalization Paradox: Semantic Loss vs. Reasoning Gains in Agentic AI Q&A
Satyajit Movidi, Stephen Russell
TL;DR
AiVisor investigates personalization in agentic Q&A for student advising, employing a retrieval-augmented LLM pipeline evaluated across lexical, semantic, and grounding metrics under a lexically stringent test. The study demonstrates metric-dependent trade-offs: personalization improves reasoning and grounding while causing semantic similarity penalties when compared to a single generic ground truth, highlighting methodological limits of standard metrics. Using a Linear Mixed-Effects Model and multi-metric normalization, the work reveals complex interactions between role prompting, retrieval conditioning, and personalization stages. Fully integrated personalization (System K) achieves the strongest composite performance by balancing reasoning gains with grounding enhancements, providing a methodological framework for robust, transparent personalization in agentic AI.
Abstract
AIVisor, an agentic retrieval-augmented LLM for student advising, was used to examine how personalization affects system performance across multiple evaluation dimensions. Using twelve authentic advising questions intentionally designed to stress lexical precision, we compared ten personalized and non-personalized system configurations and analyzed outcomes with a Linear Mixed-Effects Model across lexical (BLEU, ROUGE-L), semantic (METEOR, BERTScore), and grounding (RAGAS) metrics. Results showed a consistent trade-off: personalization reliably improved reasoning quality and grounding, yet introduced a significant negative interaction on semantic similarity, driven not by poorer answers but by the limits of current metrics, which penalize meaningful personalized deviations from generic reference texts. This reveals a structural flaw in prevailing LLM evaluation methods, which are ill-suited for assessing user-specific responses. The fully integrated personalized configuration produced the highest overall gains, suggesting that personalization can enhance system effectiveness when evaluated with appropriate multidimensional metrics. Overall, the study demonstrates that personalization produces metric-dependent shifts rather than uniform improvements and provides a methodological foundation for more transparent and robust personalization in agentic AI.
