DiSCo: LLM Knowledge Distillation for Efficient Sparse Retrieval in Conversational Search
Simon Lupart, Mohammad Aliannejadi, Evangelos Kanoulas
TL;DR
DiSCo tackles efficiency challenges in conversational search by avoiding heavy LLM inference for retrieval. It introduces a score-level distillation framework that unifies context modeling and retrieval, distilling similarity scores between rewritten queries and documents instead of enforcing a fixed representation. The approach supports multiple teachers and a fusion mechanism, yielding state-of-the-art results on five CS datasets in both in-domain and out-of-domain settings, with notable gains in recall and improved sparsity control. DiSCo demonstrates robustness to teacher quality and offers practical efficiency by reducing reliance on LLM calls while maintaining high retrieval performance.
Abstract
Conversational Search (CS) involves retrieving relevant documents from a corpus while considering the conversational context, integrating retrieval with context modeling. Recent advancements in Large Language Models (LLMs) have significantly enhanced CS by enabling query rewriting based on conversational context. However, employing LLMs during inference poses efficiency challenges. Existing solutions mitigate this issue by distilling embeddings derived from human-rewritten queries, focusing primarily on learning the context modeling task. These methods, however, often separate the contrastive retrieval task from the distillation process, treating it as an independent loss term. To overcome these limitations, we introduce DiSCo (Distillation of Sparse Conversational retrieval), a novel approach that unifies retrieval and context modeling through a relaxed distillation objective. Instead of relying exclusively on representation learning, our method distills similarity scores between conversations and documents, providing more freedom in the representation space and better leveraging the contrastive nature of document relevance. Extensive experiments on Learned Sparse Retrieval (LSR) across five CS datasets demonstrate that DiSCo achieves substantial improvements in both in-domain and out-of-domain retrieval tasks, achieving up to a six-point gain in recall for out-of-domain datasets over state-of-the-art methods. Additionally, DiSCo employs a multi-teacher distillation strategy, using multiple LLMs as teachers, further enhancing performance and surpassing the individual teachers in in-domain settings. Furthermore, analysis of model sparsity reveals that DiSCo allows for more effective control over the sparsity of the trained models.
