$\text{R}^2\text{R}$: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers
Xinyu Wang, Hanwei Wu, Qingchen Hu, Zhenghan Tai, Jingrui Tian, Lei Ding, Jijun Chi, Hailin He, Tung Sum Thomas Kwok, Yufei Cui, Sicheng Lyu, Muzhi Li, Mingze Li, Xinyue Yu, Ling Zhou, Peng Lu
TL;DR
The paper addresses the challenge that decoder-only rerankers underperform in specialized domains and suffer from surface-form overfitting and forgetting when naively fine-tuned. It introduces Route-to-Rerank ($\text{R}^2\text{R}$), a modular post-training framework combining two-stage Entity Abstraction for Generalization (EAG) with a Latent Semantic Router that dynamically activates domain LoRA experts based on internal representations of a frozen backbone, using $p(d|q)=\text{softmax}(W_r h_q + b_r)$ and $\theta(q)=\theta_{\text{base}}+\Delta\theta_{\phi(q)}$ to route queries. EAG comprises Stage 1 abstracting entities to force learning invariant relevance patterns and Stage 2 fine-tuning on original target data, guided by an automated, retriever-based data-curation process and a contrastive loss $\mathcal{L}_{\text{contrastive}}=-\log\frac{\exp(s(q,c^+)/\tau)}{\exp(s(q,c^+)/\tau)+\sum_j \exp(s(q,c_j^-)/\tau)}$. Empirically, R$^2$R yields consistent in-domain gains across legal, medical, and financial domains for multiple backbones, with the Latent Semantic Router delivering the best routing and end-to-end reranking performance at low overhead, confirming the approach's model-agnostic applicability and practical impact for robust cross-domain RAG systems.
Abstract
Decoder-only rerankers are central to Retrieval-Augmented Generation (RAG). However, generalist models miss domain-specific nuances in high-stakes fields like finance and law, and naive fine-tuning causes surface-form overfitting and catastrophic forgetting. To address this challenge, we introduce R2R, a domain-aware framework that combines dynamic expert routing with a two-stage training strategy, Entity Abstraction for Generalization (EAG). EAG introduces a counter-shortcut mechanism by masking the most predictive surface cues, forcing the reranker to learn domain-invariant relevance patterns rather than memorizing dataset-specific entities. To efficiently activate domain experts, R2R employs a lightweight Latent Semantic Router that probes internal representations from the frozen backbone decoder to select the optimal LoRA expert per query. Extensive experiments across different reranker backbones and diverse domains (legal, medical, and financial) demonstrate that R2R consistently surpasses generalist and single-domain fine-tuned baselines. Our results confirm that R2R is a model-agnostic and modular approach to domain specialization with strong cross-domain robustness.
