Table of Contents
Fetching ...

TFRank: Think-Free Reasoning Enables Practical Pointwise LLM Ranking

Yongqi Fan, Xiaoyang Chen, Dezhi Ye, Jie Liu, Haijin Liang, Jin Ma, Ben He, Yingfei Sun, Tong Ruan

TL;DR

TFRank tackles the problem of deploying reasoning-enabled ranking under strict efficiency constraints by training small-scale LLMs with multi-task supervision that includes Chain-of-Thought signals, then deploying a Think-Free inference mode that outputs direct relevance scores. The approach combines pointwise ranking with CoT distillation, fine-grained supervision, and optional GRPO optimization to achieve high accuracy with far lower latency than CoT-dependent baselines. Empirical results on BRIGHT show small backbones (as low as 1.7B params) rivaling much larger baselines, while BEIR results demonstrate competitive zero-shot generalization and strong efficiency gains. The Think-Free phenomenon indicates that reasoning ability learned during training can be effectively invoked at inference without producing explicit reasoning chains, enabling practical production deployments in real-world IR systems.

Abstract

Reasoning-intensive ranking models built on Large Language Models (LLMs) have made notable progress. However, existing approaches often rely on large-scale LLMs and explicit Chain-of-Thought (CoT) reasoning, resulting in high computational cost and latency that limit real-world use. To address this, we propose \textbf{TFRank}, an efficient pointwise reasoning ranker based on small-scale LLMs. To improve ranking performance, TFRank effectively integrates CoT data, fine-grained score supervision, and multi-task training. Furthermore, it achieves an efficient ``\textbf{T}hink-\textbf{F}ree" reasoning capability by employing a ``think-mode switch'' and pointwise format constraints. Specifically, this allows the model to leverage explicit reasoning during training while delivering precise relevance scores for complex queries at inference without generating any reasoning chains. Experiments show that TFRank achieves performance comparable to models with four times more parameters on the BRIGHT benchmark and demonstrates strong competitiveness on the BEIR benchmark. Further analysis shows that TFRank achieves an effective balance between performance and efficiency, providing a practical solution for integrating advanced reasoning into real-world systems. Our code and data are released in the repository: https://github.com/JOHNNY-fans/TFRank.

TFRank: Think-Free Reasoning Enables Practical Pointwise LLM Ranking

TL;DR

TFRank tackles the problem of deploying reasoning-enabled ranking under strict efficiency constraints by training small-scale LLMs with multi-task supervision that includes Chain-of-Thought signals, then deploying a Think-Free inference mode that outputs direct relevance scores. The approach combines pointwise ranking with CoT distillation, fine-grained supervision, and optional GRPO optimization to achieve high accuracy with far lower latency than CoT-dependent baselines. Empirical results on BRIGHT show small backbones (as low as 1.7B params) rivaling much larger baselines, while BEIR results demonstrate competitive zero-shot generalization and strong efficiency gains. The Think-Free phenomenon indicates that reasoning ability learned during training can be effectively invoked at inference without producing explicit reasoning chains, enabling practical production deployments in real-world IR systems.

Abstract

Reasoning-intensive ranking models built on Large Language Models (LLMs) have made notable progress. However, existing approaches often rely on large-scale LLMs and explicit Chain-of-Thought (CoT) reasoning, resulting in high computational cost and latency that limit real-world use. To address this, we propose \textbf{TFRank}, an efficient pointwise reasoning ranker based on small-scale LLMs. To improve ranking performance, TFRank effectively integrates CoT data, fine-grained score supervision, and multi-task training. Furthermore, it achieves an efficient ``\textbf{T}hink-\textbf{F}ree" reasoning capability by employing a ``think-mode switch'' and pointwise format constraints. Specifically, this allows the model to leverage explicit reasoning during training while delivering precise relevance scores for complex queries at inference without generating any reasoning chains. Experiments show that TFRank achieves performance comparable to models with four times more parameters on the BRIGHT benchmark and demonstrates strong competitiveness on the BEIR benchmark. Further analysis shows that TFRank achieves an effective balance between performance and efficiency, providing a practical solution for integrating advanced reasoning into real-world systems. Our code and data are released in the repository: https://github.com/JOHNNY-fans/TFRank.

Paper Structure

This paper contains 39 sections, 5 equations, 26 figures, 9 tables.

Figures (26)

  • Figure 1: Overview of the TFRank framework. (a) summarizes key performance and efficiency considerations in the model design. (b) provides an example of inference with the TFRank model.
  • Figure 2: Size and efficiency trade-offs for ranking performance on the BRIGHT benchmark. (a) NDCG@10 versus model size for different ranker families; (b) NDCG@10 versus processed queries per hour (efficiency). All TFRank models are trained on the Qwen3 series.
  • Figure 3: Score distributions for a random 1% sample of BRIGHT, evaluated by TFRank-0.6B (Qwen3) under /think and /no think inference modes.
  • Figure 4: Training time (hours) versus ranking performance (NDCG@10 on BRIGHT) for different training strategies. All TFRank models use the Qwen3 series backbone. The meaning of the $^{\ddagger}$ symbol follows that in Table \ref{['tab:bright_yesno_grpo_pytrec_eval']}.
  • Figure A1: Prompt templates for TFRank inference. (a) depicts the completion prompt for the /think mode, which requires the model to output an explicit reasoning process. (b) shows the template for the /no think mode, instructing the model to return a relevance score without reasoning directly.
  • ...and 21 more figures