Can we repurpose multiple-choice question-answering models to rerank retrieved documents?
Jasper Kyle Catapang
TL;DR
This paper investigates reranking retrieved documents by repurposing multiple-choice question-answering (MCQA) models, proposing a lightweight dual-purpose model called R* that can operate as both an MCQA system and a cross-encoder. Trained on a balanced MS MARCO corpus with hard-negative sampling, R* aims to closely approximate cross-encoder performance while reducing computational costs for retrieval-augmented generation (RAG) systems. Results on MS MARCO and additional datasets show that MCQA-based rerankers can match or exceed baseline and even rival some cross-encoder methods, with significant gains in top-relevance measures like Recall@1 and MRR@10. The work highlights the practicality of MCQA as a reranking primitive, contributes a publicly available implementation, and outlines limitations such as passage-length bias and domain generalization, pointing to future work on broader datasets and commercial reranker comparisons.
Abstract
Yes, repurposing multiple-choice question-answering (MCQA) models for document reranking is both feasible and valuable. This preliminary work is founded on mathematical parallels between MCQA decision-making and cross-encoder semantic relevance assessments, leading to the development of R*, a proof-of-concept model that harmonizes these approaches. Designed to assess document relevance with depth and precision, R* showcases how MCQA's principles can improve reranking in information retrieval (IR) and retrieval-augmented generation (RAG) systems -- ultimately enhancing search and dialogue in AI-powered systems. Through experimental validation, R* proves to improve retrieval accuracy and contribute to the field's advancement by demonstrating a practical prototype of MCQA for reranking by keeping it lightweight.
