Are Generative AI Agents Effective Personalized Financial Advisors?
Takehiro Takayanagi, Kiyoshi Izumi, Javier Sanz-Cruzado, Richard McCreadie, Iadh Ounis
TL;DR
The paper investigates the effectiveness of large language model–based advisors as personalized financial decision aids, focusing on preference elicitation, personalized guidance, and advisor personality. Using a two-stage lab study with 64 participants and four assets per investor profile, it shows that LLMs can nearly match human experts in eliciting preferences for most investor types, but falter with conflicting or ambiguous inputs. Personalization improves asset recommendations when preference elicitation succeeds, yet users often cannot reliably detect when advice is of high quality, and poor elicitation can lead to worse decisions. The study also finds that advisor personality affects both decision outcomes and user impressions, with conscientious advisors offering more balanced, informative guidance, while extroverted advisors boost trust and intention to use despite lower-quality advice. Overall, the work highlights substantial potential for LLM-advisors in finance but also important safety and design considerations to prevent misguidance and misplaced trust in high-stakes contexts.
Abstract
Large language model-based agents are becoming increasingly popular as a low-cost mechanism to provide personalized, conversational advice, and have demonstrated impressive capabilities in relatively simple scenarios, such as movie recommendations. But how do these agents perform in complex high-stakes domains, where domain expertise is essential and mistakes carry substantial risk? This paper investigates the effectiveness of LLM-advisors in the finance domain, focusing on three distinct challenges: (1) eliciting user preferences when users themselves may be unsure of their needs, (2) providing personalized guidance for diverse investment preferences, and (3) leveraging advisor personality to build relationships and foster trust. Via a lab-based user study with 64 participants, we show that LLM-advisors often match human advisor performance when eliciting preferences, although they can struggle to resolve conflicting user needs. When providing personalized advice, the LLM was able to positively influence user behavior, but demonstrated clear failure modes. Our results show that accurate preference elicitation is key, otherwise, the LLM-advisor has little impact, or can even direct the investor toward unsuitable assets. More worryingly, users appear insensitive to the quality of advice being given, or worse these can have an inverse relationship. Indeed, users reported a preference for and increased satisfaction as well as emotional trust with LLMs adopting an extroverted persona, even though those agents provided worse advice.
