Table of Contents
Fetching ...

CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval

Nam Le Hai, Thomas Gerald, Thibault Formal, Jian-Yun Nie, Benjamin Piwowarski, Laure Soulier

TL;DR

CoSPLADE tackles conversational information retrieval by eliminating heavy query-reformulation pipelines and instead using a sparse, context-aware first-stage ranker based on SPLADE v2. The model contextualizes the current query with both past queries and past answers through two dedicated SPLADE encoders and trains with a novel composite loss that includes an asymmetric component to encourage expansion from answers. A lightweight second-stage reranker (T5Mono) uses keywords derived from the first stage to re-rank documents, yielding competitive results on the TREC CAsT benchmarks and strong recall in particular. The approach achieves state-of-the-art first-stage recall while maintaining end-to-end effectiveness close to the best participating systems, suggesting practical gains in both efficiency and retrieval quality for conversational search. The work demonstrates that sparse, context-enriched representations can effectively bridge the gap between reformulation-based and fully neural first-stage ranking, with potential applicability to other conversational QA tasks.

Abstract

Conversational search is a difficult task as it aims at retrieving documents based not only on the current user query but also on the full conversation history. Most of the previous methods have focused on a multi-stage ranking approach relying on query reformulation, a critical intermediate step that might lead to a sub-optimal retrieval. Other approaches have tried to use a fully neural IR first-stage, but are either zero-shot or rely on full learning-to-rank based on a dataset with pseudo-labels. In this work, leveraging the CANARD dataset, we propose an innovative lightweight learning technique to train a first-stage ranker based on SPLADE. By relying on SPLADE sparse representations, we show that, when combined with a second-stage ranker based on T5Mono, the results are competitive on the TREC CAsT 2020 and 2021 tracks.

CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval

TL;DR

CoSPLADE tackles conversational information retrieval by eliminating heavy query-reformulation pipelines and instead using a sparse, context-aware first-stage ranker based on SPLADE v2. The model contextualizes the current query with both past queries and past answers through two dedicated SPLADE encoders and trains with a novel composite loss that includes an asymmetric component to encourage expansion from answers. A lightweight second-stage reranker (T5Mono) uses keywords derived from the first stage to re-rank documents, yielding competitive results on the TREC CAsT benchmarks and strong recall in particular. The approach achieves state-of-the-art first-stage recall while maintaining end-to-end effectiveness close to the best participating systems, suggesting practical gains in both efficiency and retrieval quality for conversational search. The work demonstrates that sparse, context-enriched representations can effectively bridge the gap between reformulation-based and fully neural first-stage ranking, with potential applicability to other conversational QA tasks.

Abstract

Conversational search is a difficult task as it aims at retrieving documents based not only on the current user query but also on the full conversation history. Most of the previous methods have focused on a multi-stage ranking approach relying on query reformulation, a critical intermediate step that might lead to a sub-optimal retrieval. Other approaches have tried to use a fully neural IR first-stage, but are either zero-shot or rely on full learning-to-rank based on a dataset with pseudo-labels. In this work, leveraging the CANARD dataset, we propose an innovative lightweight learning technique to train a first-stage ranker based on SPLADE. By relying on SPLADE sparse representations, we show that, when combined with a second-stage ranker based on T5Mono, the results are competitive on the TREC CAsT 2020 and 2021 tracks.
Paper Structure (25 sections, 12 equations, 3 tables)