Table of Contents
Fetching ...

Ads that Talk Back: Implications and Perceptions of Injecting Personalized Advertising into LLM Chatbots

Brian Jay Tang, Kaiwen Sun, Noah T. Curran, Florian Schaub, Kang G. Shin

TL;DR

The paper investigates the implications of injecting personalized advertising into LLM chatbots, proposing a realistic ad engine and an open-source dataset (Phi-4-Ads) to study both technical performance and user perceptions. Through benchmark evaluation and a between-subject online study with 179 participants, the authors find only minor objective degradation in LLM performance due to ad prompts, while user perceptions vary with model strength: GPT-4o enables subtle, often undetected ads that can positively shift product attitudes, whereas GPT-3.5 makes ads feel intrusive and harms product perception. The work highlights substantial ethical and practical risks, including perceived manipulation, erosion of neutrality, and privacy concerns, and argues that conventional disclosure mechanisms are largely ineffective in chat contexts. It concludes with policy and design recommendations, emphasizing in-chat privacy controls and transparency, and provides open-source resources to support future research and responsible deployment of chatbot advertising.

Abstract

Recent advances in large language models (LLMs) have enabled the creation of highly effective chatbots. However, the compute costs of widely deploying LLMs have raised questions about profitability. Companies have proposed exploring ad-based revenue streams for monetizing LLMs, which could serve as the new de facto platform for advertising. This paper investigates the implications of personalizing LLM advertisements to individual users via a between-subjects experiment with 179 participants. We developed a chatbot that embeds personalized product advertisements within LLM responses, inspired by similar forays by AI companies. The evaluation of our benchmarks showed that ad injection only slightly impacted LLM performance, particularly response desirability. Results revealed that participants struggled to detect ads, and even preferred LLM responses with hidden advertisements. Rather than clicking on our advertising disclosure, participants tried changing their advertising settings using natural language queries. We created an advertising dataset and an open-source LLM, Phi-4-Ads, fine-tuned to serve ads and flexibly adapt to user preferences.

Ads that Talk Back: Implications and Perceptions of Injecting Personalized Advertising into LLM Chatbots

TL;DR

The paper investigates the implications of injecting personalized advertising into LLM chatbots, proposing a realistic ad engine and an open-source dataset (Phi-4-Ads) to study both technical performance and user perceptions. Through benchmark evaluation and a between-subject online study with 179 participants, the authors find only minor objective degradation in LLM performance due to ad prompts, while user perceptions vary with model strength: GPT-4o enables subtle, often undetected ads that can positively shift product attitudes, whereas GPT-3.5 makes ads feel intrusive and harms product perception. The work highlights substantial ethical and practical risks, including perceived manipulation, erosion of neutrality, and privacy concerns, and argues that conventional disclosure mechanisms are largely ineffective in chat contexts. It concludes with policy and design recommendations, emphasizing in-chat privacy controls and transparency, and provides open-source resources to support future research and responsible deployment of chatbot advertising.

Abstract

Recent advances in large language models (LLMs) have enabled the creation of highly effective chatbots. However, the compute costs of widely deploying LLMs have raised questions about profitability. Companies have proposed exploring ad-based revenue streams for monetizing LLMs, which could serve as the new de facto platform for advertising. This paper investigates the implications of personalizing LLM advertisements to individual users via a between-subjects experiment with 179 participants. We developed a chatbot that embeds personalized product advertisements within LLM responses, inspired by similar forays by AI companies. The evaluation of our benchmarks showed that ad injection only slightly impacted LLM performance, particularly response desirability. Results revealed that participants struggled to detect ads, and even preferred LLM responses with hidden advertisements. Rather than clicking on our advertising disclosure, participants tried changing their advertising settings using natural language queries. We created an advertising dataset and an open-source LLM, Phi-4-Ads, fine-tuned to serve ads and flexibly adapt to user preferences.
Paper Structure (57 sections, 16 figures, 12 tables)

This paper contains 57 sections, 16 figures, 12 tables.

Figures (16)

  • Figure 1: Advertisements found on Bing Chat.
  • Figure 2: A high-level pipeline from user query to LLM advertising response of the chatbot advertising engine used in our study. Our design mimics OBA bidding systems by randomly selecting products relevant to the user's interests and the current topic. A LLM handles user profiling, topic classification, and ad delivery.
  • Figure 3: Our chatbot advertising engine design. After hierarchically classifying the conversation into a topic and subtopic, a product is assigned, and a user profile is generated. Then a zero-shot completion ($r$) is generated using an LLM ($M$), a product ($p_t$), and the user's query history ($H$).
  • Figure 4: Our chatbot website interface, "Chatbot XYZ." GPT-4o serving interest-based ads. Disclosure in bottom right. Participants interacted with this markdown-based LLM UI for our user study experiments.
  • Figure 5: Our advertising disclosure popup design contains products, and explanation, and their user profile.
  • ...and 11 more figures