C-SEO Bench: Does Conversational SEO Work?

Haritz Puerto; Martin Gubri; Tommaso Green; Seong Joon Oh; Sangdoo Yun

C-SEO Bench: Does Conversational SEO Work?

Haritz Puerto, Martin Gubri, Tommaso Green, Seong Joon Oh, Sangdoo Yun

TL;DR

This work introduces C-SEO Bench, a comprehensive benchmark to evaluate conversational SEO methods across two primary tasks (product recommendation and question answering) and six domains, incorporating realistic multi-adopter dynamics via an adoption-rate model. Across extensive experiments with multiple LLMs, the study finds that most current C-SEO methods offer little or negative gains in citation ranking, and that traditional SEO—improving retrieval ranking—has a much larger impact on LLM-generated citations. The results reveal that C-SEO operates as a congested, zero-sum game where gains shrink as more actors adopt similar methods, underscoring the need for strategies that consider competition and baseline context. The paper concludes that C-SEO should complement, not replace, traditional SEO, and highlights future directions toward synergistic approaches and broader, multi-domain evaluations.

Abstract

Large Language Models (LLMs) are transforming search engines into Conversational Search Engines (CSE). Consequently, Search Engine Optimization (SEO) is being shifted into Conversational Search Engine Optimization (C-SEO). We are beginning to see dedicated C-SEO methods for modifying web documents to increase their visibility in CSE responses. However, they are often tested only for a limited breadth of application domains; we do not know whether certain C-SEO methods would be effective for a broad range of domains. Moreover, existing evaluations consider only a single-actor scenario where only one web document adopts a C-SEO method; in reality, multiple players are likely to competitively adopt the cutting-edge C-SEO techniques, drawing an analogy from the dynamics we have seen in SEO. We present C-SEO Bench, the first benchmark designed to evaluate C-SEO methods across multiple tasks, domains, and number of actors. We consider two search tasks, question answering and product recommendation, with three domains each. We also formalize a new evaluation protocol with varying adoption rates among involved actors. Our experiments reveal that most current C-SEO methods are not only largely ineffective but also frequently have a negative impact on document ranking, which is opposite to what is expected. Instead, traditional SEO strategies, those aiming to improve the ranking of the source in the LLM context, are significantly more effective. We also observe that as we increase the number of C-SEO adopters, the overall gains decrease, depicting a congested and zero-sum nature of the problem. Our code and data are available at https://github.com/parameterlab/c-seo-bench and https://huggingface.co/datasets/parameterlab/c-seo-bench.

C-SEO Bench: Does Conversational SEO Work?

TL;DR

Abstract

C-SEO Bench: Does Conversational SEO Work?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)