Table of Contents
Fetching ...

RLRF: Competitive Search Agent Design via Reinforcement Learning from Ranker Feedback

Tommy Mordo, Sagie Dekel, Omer Madmon, Moshe Tennenholtz, Oren Kurland

TL;DR

This paper addresses the problem of competitive search where publishers (LLM-based agents) modify documents to improve rankings under dynamic competition. It introduces Reinforcement Learning from Ranker Feedback (RLRF), training RL-aligned agents (RA agents) via Direct Preference Optimization on synthetic preference data generated through Static Generation and Dynamic Generation in ranking games. Key findings show that RA agents consistently outperform non-aligned baselines, generalize to unseen ranking functions, and adapt to strategic opponents, with Dynamic Generation yielding stronger performance than Static Generation. The work demonstrates the viability of RL-based alignment for publisher-driven content optimization in information retrieval, offering scalable, data-efficient training that extends to out-of-distribution rankers and multi-agent settings. It also highlights FAITHfulness considerations and transferability across ranking functions, underscoring practical implications for robust, competitive search systems.

Abstract

Competitive search is a setting where document publishers modify them to improve their ranking in response to a query. Recently, publishers have increasingly leveraged LLMs to generate and modify competitive content. We introduce Reinforcement Learning from Ranker Feedback (RLRF), a framework that trains LLMs using preference datasets derived from ranking competitions. The goal of a publisher (LLM-based) agent is to optimize content for improved ranking while accounting for the strategies of competing agents. We generate the datasets using approaches that do not rely on human-authored data. We show that our proposed agents consistently and substantially outperform previously suggested approaches for LLM-based competitive document modification. We further show that our agents are effective with ranking functions they were not trained for (i.e., out of distribution) and they adapt to strategic opponents. These findings provide support to the significant potential of using reinforcement learning in competitive search.

RLRF: Competitive Search Agent Design via Reinforcement Learning from Ranker Feedback

TL;DR

This paper addresses the problem of competitive search where publishers (LLM-based agents) modify documents to improve rankings under dynamic competition. It introduces Reinforcement Learning from Ranker Feedback (RLRF), training RL-aligned agents (RA agents) via Direct Preference Optimization on synthetic preference data generated through Static Generation and Dynamic Generation in ranking games. Key findings show that RA agents consistently outperform non-aligned baselines, generalize to unseen ranking functions, and adapt to strategic opponents, with Dynamic Generation yielding stronger performance than Static Generation. The work demonstrates the viability of RL-based alignment for publisher-driven content optimization in information retrieval, offering scalable, data-efficient training that extends to out-of-distribution rankers and multi-agent settings. It also highlights FAITHfulness considerations and transferability across ranking functions, underscoring practical implications for robust, competitive search systems.

Abstract

Competitive search is a setting where document publishers modify them to improve their ranking in response to a query. Recently, publishers have increasingly leveraged LLMs to generate and modify competitive content. We introduce Reinforcement Learning from Ranker Feedback (RLRF), a framework that trains LLMs using preference datasets derived from ranking competitions. The goal of a publisher (LLM-based) agent is to optimize content for improved ranking while accounting for the strategies of competing agents. We generate the datasets using approaches that do not rely on human-authored data. We show that our proposed agents consistently and substantially outperform previously suggested approaches for LLM-based competitive document modification. We further show that our agents are effective with ranking functions they were not trained for (i.e., out of distribution) and they adapt to strategic opponents. These findings provide support to the significant potential of using reinforcement learning in competitive search.

Paper Structure

This paper contains 45 sections, 7 equations, 11 figures, 8 tables, 2 algorithms.

Figures (11)

  • Figure 1: RLRF Agent: Static Generation
  • Figure 2: The faithfulness score of the RA agent and the NA agent for the He and DG (left), Ho and DG (middle), and He and SG (right) settings.
  • Figure 3: Illustration of a single game within a ranking competition. Each competition consists of multiple games. Each game is assigned with a query and composed of multiple rounds of agents' interaction. In each round, agents modify their documents and receive the rankings of each document.
  • Figure 4: The prompt for generating the pseudo-relevant document.
  • Figure 5: The prompt for generating the modified documents with no past rankings feedback.
  • ...and 6 more figures