Table of Contents
Fetching ...

DiSCo: LLM Knowledge Distillation for Efficient Sparse Retrieval in Conversational Search

Simon Lupart, Mohammad Aliannejadi, Evangelos Kanoulas

TL;DR

DiSCo tackles efficiency challenges in conversational search by avoiding heavy LLM inference for retrieval. It introduces a score-level distillation framework that unifies context modeling and retrieval, distilling similarity scores between rewritten queries and documents instead of enforcing a fixed representation. The approach supports multiple teachers and a fusion mechanism, yielding state-of-the-art results on five CS datasets in both in-domain and out-of-domain settings, with notable gains in recall and improved sparsity control. DiSCo demonstrates robustness to teacher quality and offers practical efficiency by reducing reliance on LLM calls while maintaining high retrieval performance.

Abstract

Conversational Search (CS) involves retrieving relevant documents from a corpus while considering the conversational context, integrating retrieval with context modeling. Recent advancements in Large Language Models (LLMs) have significantly enhanced CS by enabling query rewriting based on conversational context. However, employing LLMs during inference poses efficiency challenges. Existing solutions mitigate this issue by distilling embeddings derived from human-rewritten queries, focusing primarily on learning the context modeling task. These methods, however, often separate the contrastive retrieval task from the distillation process, treating it as an independent loss term. To overcome these limitations, we introduce DiSCo (Distillation of Sparse Conversational retrieval), a novel approach that unifies retrieval and context modeling through a relaxed distillation objective. Instead of relying exclusively on representation learning, our method distills similarity scores between conversations and documents, providing more freedom in the representation space and better leveraging the contrastive nature of document relevance. Extensive experiments on Learned Sparse Retrieval (LSR) across five CS datasets demonstrate that DiSCo achieves substantial improvements in both in-domain and out-of-domain retrieval tasks, achieving up to a six-point gain in recall for out-of-domain datasets over state-of-the-art methods. Additionally, DiSCo employs a multi-teacher distillation strategy, using multiple LLMs as teachers, further enhancing performance and surpassing the individual teachers in in-domain settings. Furthermore, analysis of model sparsity reveals that DiSCo allows for more effective control over the sparsity of the trained models.

DiSCo: LLM Knowledge Distillation for Efficient Sparse Retrieval in Conversational Search

TL;DR

DiSCo tackles efficiency challenges in conversational search by avoiding heavy LLM inference for retrieval. It introduces a score-level distillation framework that unifies context modeling and retrieval, distilling similarity scores between rewritten queries and documents instead of enforcing a fixed representation. The approach supports multiple teachers and a fusion mechanism, yielding state-of-the-art results on five CS datasets in both in-domain and out-of-domain settings, with notable gains in recall and improved sparsity control. DiSCo demonstrates robustness to teacher quality and offers practical efficiency by reducing reliance on LLM calls while maintaining high retrieval performance.

Abstract

Conversational Search (CS) involves retrieving relevant documents from a corpus while considering the conversational context, integrating retrieval with context modeling. Recent advancements in Large Language Models (LLMs) have significantly enhanced CS by enabling query rewriting based on conversational context. However, employing LLMs during inference poses efficiency challenges. Existing solutions mitigate this issue by distilling embeddings derived from human-rewritten queries, focusing primarily on learning the context modeling task. These methods, however, often separate the contrastive retrieval task from the distillation process, treating it as an independent loss term. To overcome these limitations, we introduce DiSCo (Distillation of Sparse Conversational retrieval), a novel approach that unifies retrieval and context modeling through a relaxed distillation objective. Instead of relying exclusively on representation learning, our method distills similarity scores between conversations and documents, providing more freedom in the representation space and better leveraging the contrastive nature of document relevance. Extensive experiments on Learned Sparse Retrieval (LSR) across five CS datasets demonstrate that DiSCo achieves substantial improvements in both in-domain and out-of-domain retrieval tasks, achieving up to a six-point gain in recall for out-of-domain datasets over state-of-the-art methods. Additionally, DiSCo employs a multi-teacher distillation strategy, using multiple LLMs as teachers, further enhancing performance and surpassing the individual teachers in in-domain settings. Furthermore, analysis of model sparsity reveals that DiSCo allows for more effective control over the sparsity of the trained models.

Paper Structure

This paper contains 14 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Similarity Score Distillation in $\mathbb{R}^2$. Existing loss functions bound representation of the full conversation representation to converge to a single rewrite representation (convDR, red arrow), while if we consider document $\mathbf{d}$ as anchor, an infinite number of representations, other than $\mathbf{E_{q}(q_{rw})}$, have the same similarity with $\mathbf{d}$ ($\mathbf{Y}$, blue hyperplane). DiSCo allows the model to converge to the best representation from the $\mathbf{Y}$ hyperplane (green arrows), as a relaxation.
  • Figure 2: DiSCo, as the Distillation of LLM rewritten queries through a contrastive objective. Previous works distilled representations themselves, while our approach distills similarities with documents from the corpus, relaxing the learning objective.
  • Figure 3: Distillation process. The first step stores scores from the rewritten queries with documents from the corpus. Then the student query encoder $\tilde{E}_q$ is trained to reproduce the output scores of the teacher.
  • Figure 4: Teacher Selection on QReCC. (Left) SPLADE Teacher Models with different LLM QR. (Right) DiSCo Students when trained with multi-teachers. Best set in red ($T_2$ and $T_3$).
  • Figure 5: Effectiveness-efficiency trade-off on TopiOCQA. Sparser representations have a lower latency but also a lower MRR. avg q_len is the average number of activated tokens in the conversation representations, as indicator of efficiency.
  • ...and 1 more figures