Table of Contents
Fetching ...

Detecting RAG Advertisements Across Advertising Styles

Sebastian Heineking, Wilhelm Pertsch, Ines Zelch, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

TL;DR

A taxonomy of advertising styles for LLMs is developed, combining the style dimensions of explicitness and type of appeal, and a variety of ad-detection approaches are evaluated with respect to their robustness under these changes.

Abstract

Large language models (LLMs) enable a new form of advertising for retrieval-augmented generation (RAG) systems in which organic responses are blended with contextually relevant ads. The prospect of such "generated native ads" has sparked interest in whether they can be detected automatically. Existing datasets, however, do not reflect the diversity of advertising styles discussed in the marketing literature. In this paper, we (1) develop a taxonomy of advertising styles for LLMs, combining the style dimensions of explicitness and type of appeal, (2) simulate that advertisers may attempt to evade detection by changing their advertising style, and (3) evaluate a variety of ad-detection approaches with respect to their robustness under these changes. Expanding previous work on ad detection, we train models that use entity recognition to exactly locate an ad in an LLM response and find them to be both very effective at detecting responses with ads and largely robust to changes in the advertising style. Since ad blocking will be performed on low-resource end-user devices, we include lightweight models like random forests and SVMs in our evaluation. These models, however, are brittle under such changes, highlighting the need for further efficiency-oriented research for a practical approach to blocking of generated ads.

Detecting RAG Advertisements Across Advertising Styles

TL;DR

A taxonomy of advertising styles for LLMs is developed, combining the style dimensions of explicitness and type of appeal, and a variety of ad-detection approaches are evaluated with respect to their robustness under these changes.

Abstract

Large language models (LLMs) enable a new form of advertising for retrieval-augmented generation (RAG) systems in which organic responses are blended with contextually relevant ads. The prospect of such "generated native ads" has sparked interest in whether they can be detected automatically. Existing datasets, however, do not reflect the diversity of advertising styles discussed in the marketing literature. In this paper, we (1) develop a taxonomy of advertising styles for LLMs, combining the style dimensions of explicitness and type of appeal, (2) simulate that advertisers may attempt to evade detection by changing their advertising style, and (3) evaluate a variety of ad-detection approaches with respect to their robustness under these changes. Expanding previous work on ad detection, we train models that use entity recognition to exactly locate an ad in an LLM response and find them to be both very effective at detecting responses with ads and largely robust to changes in the advertising style. Since ad blocking will be performed on low-resource end-user devices, we include lightweight models like random forests and SVMs in our evaluation. These models, however, are brittle under such changes, highlighting the need for further efficiency-oriented research for a practical approach to blocking of generated ads.
Paper Structure (27 sections, 6 figures, 6 tables)

This paper contains 27 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Examples of generated native ads in RAG responses using four advertising styles (one per cell). Note the explicitness of the ad snippets in the "Overt" row, or the appeal to emotions in the "Emotional" column (marked in bold).
  • Figure 2: Example responses for different advertising prompts. The chat window shows a user query for last minute travel and the response generated by a search engine. This response is adapted by inserting an ad for the item "FUN Flights". In addition to the response taken from the WGNA 25 test set, the figure shows the variations generated for different advertising styles.
  • Figure 3: Prompt to create covert advertisements with rational appeals. The placeholders are filled with the information depicted in Figure \ref{['fig:ad-examples']}.
  • Figure 4: Ad detection odds ratios (95 % CI). For each classifier and new test set, we compared the odds of detecting an ad in the new test set to the odds in the reference test set (see also Table \ref{['tab:contingency_example']}). The black vertical ticks show the odds ratio and the colored horizontal lines the corresponding confidence interval. The x-Axis is cut at 3.0 for improved clarity.
  • Figure 5: Average overlap in false negatives. The heatmap shows the mean Jaccard index over all test sets. A score of 1 indicates that two classifiers always miss the same ads.
  • ...and 1 more figures