Table of Contents
Fetching ...

LimRank: Less is More for Reasoning-Intensive Information Reranking

Tingyu Song, Yilun Zhao, Siyue Zhang, Chen Zhao, Arman Cohan

TL;DR

This paper tackles the data inefficiency of fine-tuning LLMs for reasoning-intensive information retrieval by introducing LimRank-Synthesizer, an open-source pipeline that generates high-quality, diverse synthetic data. Fine-tuning a 7B model (LimRank) on just 20K examples yields competitive results on reasoning benchmarks (BRIGHT) and instruction-following benchmarks (FollowIR), with strong generalization to downstream tasks like LitSearch and GPQA. Ablation studies confirm the necessity of each guideline component and demonstrate the advantages of LimRank-Synthesizer over other synthetic-data approaches. Overall, the work provides empirical support for a less-is-more approach in IR, offering a practical, scalable alternative to large-scale data collection and fine-tuning.

Abstract

Existing approaches typically rely on large-scale fine-tuning to adapt LLMs for information reranking tasks, which is computationally expensive. In this work, we demonstrate that modern LLMs can be effectively adapted using only minimal, high-quality supervision. To enable this, we design LIMRANK-SYNTHESIZER, a reusable and open-source pipeline for generating diverse, challenging, and realistic reranking examples. Using this synthetic data, we fine-tune our reranker model, LIMRANK. We evaluate LIMRANK on two challenging benchmarks, i.e., BRIGHT for reasoning-intensive retrieval and FollowIR for instruction-following retrieval. Our experiments demonstrate that LIMRANK achieves competitive performance, while being trained on less than 5% of the data typically used in prior work. Further ablation studies demonstrate the effectiveness of LIMRANK-SYNTHESIZER and the strong generalization capabilities of LIMRANK across downstream tasks, including scientific literature search and retrieval-augmented generation for knowledge-intensive problem solving.

LimRank: Less is More for Reasoning-Intensive Information Reranking

TL;DR

This paper tackles the data inefficiency of fine-tuning LLMs for reasoning-intensive information retrieval by introducing LimRank-Synthesizer, an open-source pipeline that generates high-quality, diverse synthetic data. Fine-tuning a 7B model (LimRank) on just 20K examples yields competitive results on reasoning benchmarks (BRIGHT) and instruction-following benchmarks (FollowIR), with strong generalization to downstream tasks like LitSearch and GPQA. Ablation studies confirm the necessity of each guideline component and demonstrate the advantages of LimRank-Synthesizer over other synthetic-data approaches. Overall, the work provides empirical support for a less-is-more approach in IR, offering a practical, scalable alternative to large-scale data collection and fine-tuning.

Abstract

Existing approaches typically rely on large-scale fine-tuning to adapt LLMs for information reranking tasks, which is computationally expensive. In this work, we demonstrate that modern LLMs can be effectively adapted using only minimal, high-quality supervision. To enable this, we design LIMRANK-SYNTHESIZER, a reusable and open-source pipeline for generating diverse, challenging, and realistic reranking examples. Using this synthetic data, we fine-tune our reranker model, LIMRANK. We evaluate LIMRANK on two challenging benchmarks, i.e., BRIGHT for reasoning-intensive retrieval and FollowIR for instruction-following retrieval. Our experiments demonstrate that LIMRANK achieves competitive performance, while being trained on less than 5% of the data typically used in prior work. Further ablation studies demonstrate the effectiveness of LIMRANK-SYNTHESIZER and the strong generalization capabilities of LIMRANK across downstream tasks, including scientific literature search and retrieval-augmented generation for knowledge-intensive problem solving.

Paper Structure

This paper contains 41 sections, 12 figures, 9 tables.

Figures (12)

  • Figure 1: (Top) An illustration of reasoning-intensive reranking scenarios that demand more than surface-level semantic matching. These tasks require multi-step inference, contextual reasoning, and recognition of implicit relationships between queries and documents. (Bottom) An overview of LimRank-Synthesizer, which generates high-quality training data for reranking tasks.
  • Figure 2: Prompt for query expansion in daily life scenario.
  • Figure 3: Prompt for query expansion in daily life scenario.
  • Figure 4: Prompt for query expansion in daily life scenario.
  • Figure 5: Prompt for problem solving.
  • ...and 7 more figures