Ads that Talk Back: Implications and Perceptions of Injecting Personalized Advertising into LLM Chatbots
Brian Jay Tang, Kaiwen Sun, Noah T. Curran, Florian Schaub, Kang G. Shin
TL;DR
The paper investigates the implications of injecting personalized advertising into LLM chatbots, proposing a realistic ad engine and an open-source dataset (Phi-4-Ads) to study both technical performance and user perceptions. Through benchmark evaluation and a between-subject online study with 179 participants, the authors find only minor objective degradation in LLM performance due to ad prompts, while user perceptions vary with model strength: GPT-4o enables subtle, often undetected ads that can positively shift product attitudes, whereas GPT-3.5 makes ads feel intrusive and harms product perception. The work highlights substantial ethical and practical risks, including perceived manipulation, erosion of neutrality, and privacy concerns, and argues that conventional disclosure mechanisms are largely ineffective in chat contexts. It concludes with policy and design recommendations, emphasizing in-chat privacy controls and transparency, and provides open-source resources to support future research and responsible deployment of chatbot advertising.
Abstract
Recent advances in large language models (LLMs) have enabled the creation of highly effective chatbots. However, the compute costs of widely deploying LLMs have raised questions about profitability. Companies have proposed exploring ad-based revenue streams for monetizing LLMs, which could serve as the new de facto platform for advertising. This paper investigates the implications of personalizing LLM advertisements to individual users via a between-subjects experiment with 179 participants. We developed a chatbot that embeds personalized product advertisements within LLM responses, inspired by similar forays by AI companies. The evaluation of our benchmarks showed that ad injection only slightly impacted LLM performance, particularly response desirability. Results revealed that participants struggled to detect ads, and even preferred LLM responses with hidden advertisements. Rather than clicking on our advertising disclosure, participants tried changing their advertising settings using natural language queries. We created an advertising dataset and an open-source LLM, Phi-4-Ads, fine-tuned to serve ads and flexibly adapt to user preferences.
