Table of Contents
Fetching ...

$\text{R}^2\text{R}$: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers

Xinyu Wang, Hanwei Wu, Qingchen Hu, Zhenghan Tai, Jingrui Tian, Lei Ding, Jijun Chi, Hailin He, Tung Sum Thomas Kwok, Yufei Cui, Sicheng Lyu, Muzhi Li, Mingze Li, Xinyue Yu, Ling Zhou, Peng Lu

TL;DR

The paper addresses the challenge that decoder-only rerankers underperform in specialized domains and suffer from surface-form overfitting and forgetting when naively fine-tuned. It introduces Route-to-Rerank ($\text{R}^2\text{R}$), a modular post-training framework combining two-stage Entity Abstraction for Generalization (EAG) with a Latent Semantic Router that dynamically activates domain LoRA experts based on internal representations of a frozen backbone, using $p(d|q)=\text{softmax}(W_r h_q + b_r)$ and $\theta(q)=\theta_{\text{base}}+\Delta\theta_{\phi(q)}$ to route queries. EAG comprises Stage 1 abstracting entities to force learning invariant relevance patterns and Stage 2 fine-tuning on original target data, guided by an automated, retriever-based data-curation process and a contrastive loss $\mathcal{L}_{\text{contrastive}}=-\log\frac{\exp(s(q,c^+)/\tau)}{\exp(s(q,c^+)/\tau)+\sum_j \exp(s(q,c_j^-)/\tau)}$. Empirically, R$^2$R yields consistent in-domain gains across legal, medical, and financial domains for multiple backbones, with the Latent Semantic Router delivering the best routing and end-to-end reranking performance at low overhead, confirming the approach's model-agnostic applicability and practical impact for robust cross-domain RAG systems.

Abstract

Decoder-only rerankers are central to Retrieval-Augmented Generation (RAG). However, generalist models miss domain-specific nuances in high-stakes fields like finance and law, and naive fine-tuning causes surface-form overfitting and catastrophic forgetting. To address this challenge, we introduce R2R, a domain-aware framework that combines dynamic expert routing with a two-stage training strategy, Entity Abstraction for Generalization (EAG). EAG introduces a counter-shortcut mechanism by masking the most predictive surface cues, forcing the reranker to learn domain-invariant relevance patterns rather than memorizing dataset-specific entities. To efficiently activate domain experts, R2R employs a lightweight Latent Semantic Router that probes internal representations from the frozen backbone decoder to select the optimal LoRA expert per query. Extensive experiments across different reranker backbones and diverse domains (legal, medical, and financial) demonstrate that R2R consistently surpasses generalist and single-domain fine-tuned baselines. Our results confirm that R2R is a model-agnostic and modular approach to domain specialization with strong cross-domain robustness.

$\text{R}^2\text{R}$: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers

TL;DR

The paper addresses the challenge that decoder-only rerankers underperform in specialized domains and suffer from surface-form overfitting and forgetting when naively fine-tuned. It introduces Route-to-Rerank (), a modular post-training framework combining two-stage Entity Abstraction for Generalization (EAG) with a Latent Semantic Router that dynamically activates domain LoRA experts based on internal representations of a frozen backbone, using and to route queries. EAG comprises Stage 1 abstracting entities to force learning invariant relevance patterns and Stage 2 fine-tuning on original target data, guided by an automated, retriever-based data-curation process and a contrastive loss . Empirically, RR yields consistent in-domain gains across legal, medical, and financial domains for multiple backbones, with the Latent Semantic Router delivering the best routing and end-to-end reranking performance at low overhead, confirming the approach's model-agnostic applicability and practical impact for robust cross-domain RAG systems.

Abstract

Decoder-only rerankers are central to Retrieval-Augmented Generation (RAG). However, generalist models miss domain-specific nuances in high-stakes fields like finance and law, and naive fine-tuning causes surface-form overfitting and catastrophic forgetting. To address this challenge, we introduce R2R, a domain-aware framework that combines dynamic expert routing with a two-stage training strategy, Entity Abstraction for Generalization (EAG). EAG introduces a counter-shortcut mechanism by masking the most predictive surface cues, forcing the reranker to learn domain-invariant relevance patterns rather than memorizing dataset-specific entities. To efficiently activate domain experts, R2R employs a lightweight Latent Semantic Router that probes internal representations from the frozen backbone decoder to select the optimal LoRA expert per query. Extensive experiments across different reranker backbones and diverse domains (legal, medical, and financial) demonstrate that R2R consistently surpasses generalist and single-domain fine-tuned baselines. Our results confirm that R2R is a model-agnostic and modular approach to domain specialization with strong cross-domain robustness.

Paper Structure

This paper contains 23 sections, 6 equations, 3 figures, 3 tables, 3 algorithms.

Figures (3)

  • Figure 1: The impact of accurate domain routing on reranking quality. (A) A domain-aware router correctly activates a LoRA expert for a given query, maximizing in-domain expertise and precision. (B) Expert selection without proper routing results in domain mismatch and suboptimal reranking performance.
  • Figure 2: Overview of the $\text{R}^2\text{R}$ framework. Top: The full Route-to-Rerank pipeline. The two-stage EAG curriculum first abstracts entities to learn invariant relevance patterns, then specializes on original domain data to produce domain-specific LoRA experts. During inference, the Latent Semantic Router probes the frozen backbone to select the appropriate expert. Bottom: The LoRA-augmented transformer block, where lightweight domain-specific LoRA adapters attach to the frozen reranker and are dynamically activated by the router.
  • Figure 3: Reranker performance across training stages on Lotus and Zeekr datasets (PT=Pretrained, S1=Stage 1, S2=Stage 2). Dashed lines show direct fine-tuning (PT+S2), while solid lines show the two-stage EAG pipeline (PT+S1+S2). EAG consistently outperforms direct fine-tuning across all metrics.