Table of Contents
Fetching ...

Constructing Set-Compositional and Negated Representations for First-Stage Ranking

Antonios Minas Krasakis, Andrew Yates, Evangelos Kanoulas

TL;DR

The paper tackles first-stage ranking under complex information needs expressed with set operations and negations. It introduces a zero-shot framework that builds set-compositional query representations by applying Linear Algebra Operations (LAO) to lexical, Learned Sparse Retrieval (LSR) vectors, including Disentangled Negation for $A\setminus B$ and Combined Pseudo-Terms for $A\cap B$. It further improves LSR by enabling negative term scores, and shows that zero-shot LAO can rival supervised compositional retrievers, while both Dense and LSR models struggle with modeling $A\cap B$ under domain shift. The work advances interpretable, lexical-first first-stage ranking and suggests directions for data-efficient negation learning and query decomposition.

Abstract

Set compositional and negated queries are crucial for expressing complex information needs and enable the discovery of niche items like Books about non-European monarchs. Despite the recent advances in LLMs, first-stage ranking remains challenging due to the requirement of encoding documents and queries independently from each other. This limitation calls for constructing compositional query representations that encapsulate logical operations or negations, and can be used to match relevant documents effectively. In the first part of this work, we explore constructing such representations in a zero-shot setting using vector operations between lexically grounded Learned Sparse Retrieval (LSR) representations. Specifically, we introduce Disentangled Negation that penalizes only the negated parts of a query, and a Combined Pseudo-Term approach that enhances LSRs ability to handle intersections. We find that our zero-shot approach is competitive and often outperforms retrievers fine-tuned on compositional data, highlighting certain limitations of LSR and Dense Retrievers. Finally, we address some of these limitations and improve LSRs representation power for negation, by allowing them to attribute negative term scores and effectively penalize documents containing the negated terms.

Constructing Set-Compositional and Negated Representations for First-Stage Ranking

TL;DR

The paper tackles first-stage ranking under complex information needs expressed with set operations and negations. It introduces a zero-shot framework that builds set-compositional query representations by applying Linear Algebra Operations (LAO) to lexical, Learned Sparse Retrieval (LSR) vectors, including Disentangled Negation for and Combined Pseudo-Terms for . It further improves LSR by enabling negative term scores, and shows that zero-shot LAO can rival supervised compositional retrievers, while both Dense and LSR models struggle with modeling under domain shift. The work advances interpretable, lexical-first first-stage ranking and suggests directions for data-efficient negation learning and query decomposition.

Abstract

Set compositional and negated queries are crucial for expressing complex information needs and enable the discovery of niche items like Books about non-European monarchs. Despite the recent advances in LLMs, first-stage ranking remains challenging due to the requirement of encoding documents and queries independently from each other. This limitation calls for constructing compositional query representations that encapsulate logical operations or negations, and can be used to match relevant documents effectively. In the first part of this work, we explore constructing such representations in a zero-shot setting using vector operations between lexically grounded Learned Sparse Retrieval (LSR) representations. Specifically, we introduce Disentangled Negation that penalizes only the negated parts of a query, and a Combined Pseudo-Term approach that enhances LSRs ability to handle intersections. We find that our zero-shot approach is competitive and often outperforms retrievers fine-tuned on compositional data, highlighting certain limitations of LSR and Dense Retrievers. Finally, we address some of these limitations and improve LSRs representation power for negation, by allowing them to attribute negative term scores and effectively penalize documents containing the negated terms.
Paper Structure (21 sections, 16 equations, 3 figures, 7 tables)

This paper contains 21 sections, 16 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: A zero-shot framework for constructing compositional query representations with Linear Algebra Operations.
  • Figure 2: Splade activation functions.
  • Figure 3: Performance of zero-shot negation methods, at different levels of interference between positive and negative query.