Table of Contents
Fetching ...

What Generative Search Engines Like and How to Optimize Web Content Cooperatively

Yujiang Wu, Shanshan Zhong, Yubin Kim, Chenyan Xiong

TL;DR

Generative Engines reshape search by synthesizing responses from retrieved documents, necessitating GEO to optimize content visibility within GE outputs. AutoGEO introduces a data-driven pipeline to learn engine preferences as explicit rules and builds two GEO models: a plug-and-play prompt-based AutoGEO_API and a cost-efficient RL-based AutoGEO_Mini. Across GEO-Bench, Researchy-GEO, and E-commerce, and with Gemini, Claude, and GPT, AutoGEO delivers substantial GEO gains while preserving Generative Engine Utility (GEU); Mini achieves comparable performance at ~0.0071x the cost of API. The work offers a principled, scalable path to cooperative GEO and lays groundwork for extending to agentic or multimodal GE paradigms with multi-stakeholder considerations.

Abstract

By employing large language models (LLMs) to retrieve documents and generate natural language responses, Generative Engines, such as Google AI overview and ChatGPT, provide significantly enhanced user experiences and have rapidly become the new form of search. Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them. In this paper, we introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation, and rewrite web contents for more such traction. AutoGEO first prompts frontier LLMs to explain generative engine preferences and extract meaningful preference rules from these explanations. Then it uses preference rules as context engineering for AutoGEO$_\text{API}$, a prompt-based GEO system, and as rule-based rewards to train AutoGEO$_\text{Mini}$, a cost-effective GEO model. Experiments on the standard GEO-Bench and two newly constructed benchmarks using real user queries demonstrate the effectiveness of AutoGEO in enhancing content traction while preserving search utility. Analyses confirm the learned rules' robustness and abilities to capture unique preferences in variant domains, and AutoGEO systems' ability to embed them in content optimization. The code is released at https://github.com/cxcscmu/AutoGEO.

What Generative Search Engines Like and How to Optimize Web Content Cooperatively

TL;DR

Generative Engines reshape search by synthesizing responses from retrieved documents, necessitating GEO to optimize content visibility within GE outputs. AutoGEO introduces a data-driven pipeline to learn engine preferences as explicit rules and builds two GEO models: a plug-and-play prompt-based AutoGEO_API and a cost-efficient RL-based AutoGEO_Mini. Across GEO-Bench, Researchy-GEO, and E-commerce, and with Gemini, Claude, and GPT, AutoGEO delivers substantial GEO gains while preserving Generative Engine Utility (GEU); Mini achieves comparable performance at ~0.0071x the cost of API. The work offers a principled, scalable path to cooperative GEO and lays groundwork for extending to agentic or multimodal GE paradigms with multi-stakeholder considerations.

Abstract

By employing large language models (LLMs) to retrieve documents and generate natural language responses, Generative Engines, such as Google AI overview and ChatGPT, provide significantly enhanced user experiences and have rapidly become the new form of search. Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them. In this paper, we introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation, and rewrite web contents for more such traction. AutoGEO first prompts frontier LLMs to explain generative engine preferences and extract meaningful preference rules from these explanations. Then it uses preference rules as context engineering for AutoGEO, a prompt-based GEO system, and as rule-based rewards to train AutoGEO, a cost-effective GEO model. Experiments on the standard GEO-Bench and two newly constructed benchmarks using real user queries demonstrate the effectiveness of AutoGEO in enhancing content traction while preserving search utility. Analyses confirm the learned rules' robustness and abilities to capture unique preferences in variant domains, and AutoGEO systems' ability to embed them in content optimization. The code is released at https://github.com/cxcscmu/AutoGEO.

Paper Structure

This paper contains 40 sections, 4 equations, 3 figures, 11 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overview of the proposed AutoGEO framework.
  • Figure 2: Left: Rule overlap (%) across (a) different LLMs on Researchy-GEO and (b) different datasets using the Gemini generative engine. Right: Transferability of AutoGEO$_\text{API}$ rule sets across (c) different LLM-based engines on Researchy-GEO and (d) different datasets on Gemini. "$S_{\text{Self}}$" is a rule set derived from the same LLM or dataset of the generative engine, while $S_{\text{Gemini}}$ and $S_{\text{Researchy-GEO}}$ represent the same rule set extracted from Gemini on Researchy-GEO.
  • Figure 3: GEO performance of individual rules for AutoGEO$_\text{API}$ on the Gemini generative engine.