Table of Contents
Fetching ...

C-SEO Bench: Does Conversational SEO Work?

Haritz Puerto, Martin Gubri, Tommaso Green, Seong Joon Oh, Sangdoo Yun

TL;DR

This work introduces C-SEO Bench, a comprehensive benchmark to evaluate conversational SEO methods across two primary tasks (product recommendation and question answering) and six domains, incorporating realistic multi-adopter dynamics via an adoption-rate model. Across extensive experiments with multiple LLMs, the study finds that most current C-SEO methods offer little or negative gains in citation ranking, and that traditional SEO—improving retrieval ranking—has a much larger impact on LLM-generated citations. The results reveal that C-SEO operates as a congested, zero-sum game where gains shrink as more actors adopt similar methods, underscoring the need for strategies that consider competition and baseline context. The paper concludes that C-SEO should complement, not replace, traditional SEO, and highlights future directions toward synergistic approaches and broader, multi-domain evaluations.

Abstract

Large Language Models (LLMs) are transforming search engines into Conversational Search Engines (CSE). Consequently, Search Engine Optimization (SEO) is being shifted into Conversational Search Engine Optimization (C-SEO). We are beginning to see dedicated C-SEO methods for modifying web documents to increase their visibility in CSE responses. However, they are often tested only for a limited breadth of application domains; we do not know whether certain C-SEO methods would be effective for a broad range of domains. Moreover, existing evaluations consider only a single-actor scenario where only one web document adopts a C-SEO method; in reality, multiple players are likely to competitively adopt the cutting-edge C-SEO techniques, drawing an analogy from the dynamics we have seen in SEO. We present C-SEO Bench, the first benchmark designed to evaluate C-SEO methods across multiple tasks, domains, and number of actors. We consider two search tasks, question answering and product recommendation, with three domains each. We also formalize a new evaluation protocol with varying adoption rates among involved actors. Our experiments reveal that most current C-SEO methods are not only largely ineffective but also frequently have a negative impact on document ranking, which is opposite to what is expected. Instead, traditional SEO strategies, those aiming to improve the ranking of the source in the LLM context, are significantly more effective. We also observe that as we increase the number of C-SEO adopters, the overall gains decrease, depicting a congested and zero-sum nature of the problem. Our code and data are available at https://github.com/parameterlab/c-seo-bench and https://huggingface.co/datasets/parameterlab/c-seo-bench.

C-SEO Bench: Does Conversational SEO Work?

TL;DR

This work introduces C-SEO Bench, a comprehensive benchmark to evaluate conversational SEO methods across two primary tasks (product recommendation and question answering) and six domains, incorporating realistic multi-adopter dynamics via an adoption-rate model. Across extensive experiments with multiple LLMs, the study finds that most current C-SEO methods offer little or negative gains in citation ranking, and that traditional SEO—improving retrieval ranking—has a much larger impact on LLM-generated citations. The results reveal that C-SEO operates as a congested, zero-sum game where gains shrink as more actors adopt similar methods, underscoring the need for strategies that consider competition and baseline context. The paper concludes that C-SEO should complement, not replace, traditional SEO, and highlights future directions toward synergistic approaches and broader, multi-domain evaluations.

Abstract

Large Language Models (LLMs) are transforming search engines into Conversational Search Engines (CSE). Consequently, Search Engine Optimization (SEO) is being shifted into Conversational Search Engine Optimization (C-SEO). We are beginning to see dedicated C-SEO methods for modifying web documents to increase their visibility in CSE responses. However, they are often tested only for a limited breadth of application domains; we do not know whether certain C-SEO methods would be effective for a broad range of domains. Moreover, existing evaluations consider only a single-actor scenario where only one web document adopts a C-SEO method; in reality, multiple players are likely to competitively adopt the cutting-edge C-SEO techniques, drawing an analogy from the dynamics we have seen in SEO. We present C-SEO Bench, the first benchmark designed to evaluate C-SEO methods across multiple tasks, domains, and number of actors. We consider two search tasks, question answering and product recommendation, with three domains each. We also formalize a new evaluation protocol with varying adoption rates among involved actors. Our experiments reveal that most current C-SEO methods are not only largely ineffective but also frequently have a negative impact on document ranking, which is opposite to what is expected. Instead, traditional SEO strategies, those aiming to improve the ranking of the source in the LLM context, are significantly more effective. We also observe that as we increase the number of C-SEO adopters, the overall gains decrease, depicting a congested and zero-sum nature of the problem. Our code and data are available at https://github.com/parameterlab/c-seo-bench and https://huggingface.co/datasets/parameterlab/c-seo-bench.

Paper Structure

This paper contains 31 sections, 1 equation, 5 figures, 12 tables.

Figures (5)

  • Figure 1: C-SEO vs SEO. Comparison of the best C-SEO and SEO methods on our C-SEO Bench across all 6 domains. Left: Best C-SEO strategies still fall behind the best SEO performances. Moreover, C-SEO generally does not introduce any gain (close to 0 boost in ranking). Right: With an increasing rate of actors adopting these methods, actors will experience smaller marginal gain of adopting C-SEO (as in SEO).
  • Figure 2: Conversational search engine setup. We illustrate the data pipeline for product recommendation. After applying a C-SEO method on the third document, its ranking gets boosted by $+2$ positions.
  • Figure 3: Example of C-SEO transformation. Content Improvement makes the text more attractive by highlighting key features (e.g., by bolding) and structuring the text to make the information more accesible.
  • Figure 4: Importance of traditional SEO. Average boost in ranking (y-axis) when the document is placed at a specific position in the LLM context (x-axis, SEO Baseline). The boosts of (at least) the first two positions are significant for all domains
  • Figure 5: C-SEO is a zero-sum game. The average gain per adopter decreases (y-axis) with the increasing number of adopters (x-axis), for retail (dashed) and video games (dotted) on gpt-4o-mini.