Table of Contents
Fetching ...

Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries

Ganlin Xu, Zhoujia Zhang, Wangyi Mei, Jiaqing Liang, Weijia Lu, Xiaodong Zhang, Zhifei Yang, Xiaofeng Ma, Yanghua Xiao, Deqing Yang

TL;DR

This work proposes a neuro-symbolic information retrieval method, namely NS-IR, that leverages first-order logic (FOL) to optimize the embeddings of naive natural language by considering the \emph{logical consistency} between queries and documents and introduces two novel techniques, \emph{logic alignment} and \emph{connective constraint}, to rerank candidate documents, thereby enhancing retrieval relevance.

Abstract

Information retrieval plays a crucial role in resource localization. Current dense retrievers retrieve the relevant documents within a corpus via embedding similarities, which compute similarities between dense vectors mainly depending on word co-occurrence between queries and documents, but overlook the real query intents. Thus, they often retrieve numerous irrelevant documents. Particularly in the scenarios of complex queries such as \emph{negative-constraint queries}, their retrieval performance could be catastrophic. To address the issue, we propose a neuro-symbolic information retrieval method, namely \textbf{NS-IR}, that leverages first-order logic (FOL) to optimize the embeddings of naive natural language by considering the \emph{logical consistency} between queries and documents. Specifically, we introduce two novel techniques, \emph{logic alignment} and \emph{connective constraint}, to rerank candidate documents, thereby enhancing retrieval relevance. Furthermore, we construct a new dataset \textbf{NegConstraint} including negative-constraint queries to evaluate our NS-IR's performance on such complex IR scenarios. Our extensive experiments demonstrate that NS-IR not only achieves superior zero-shot retrieval performance on web search and low-resource retrieval tasks, but also performs better on negative-constraint queries. Our scource code and dataset are available at https://github.com/xgl-git/NS-IR-main.

Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries

TL;DR

This work proposes a neuro-symbolic information retrieval method, namely NS-IR, that leverages first-order logic (FOL) to optimize the embeddings of naive natural language by considering the \emph{logical consistency} between queries and documents and introduces two novel techniques, \emph{logic alignment} and \emph{connective constraint}, to rerank candidate documents, thereby enhancing retrieval relevance.

Abstract

Information retrieval plays a crucial role in resource localization. Current dense retrievers retrieve the relevant documents within a corpus via embedding similarities, which compute similarities between dense vectors mainly depending on word co-occurrence between queries and documents, but overlook the real query intents. Thus, they often retrieve numerous irrelevant documents. Particularly in the scenarios of complex queries such as \emph{negative-constraint queries}, their retrieval performance could be catastrophic. To address the issue, we propose a neuro-symbolic information retrieval method, namely \textbf{NS-IR}, that leverages first-order logic (FOL) to optimize the embeddings of naive natural language by considering the \emph{logical consistency} between queries and documents. Specifically, we introduce two novel techniques, \emph{logic alignment} and \emph{connective constraint}, to rerank candidate documents, thereby enhancing retrieval relevance. Furthermore, we construct a new dataset \textbf{NegConstraint} including negative-constraint queries to evaluate our NS-IR's performance on such complex IR scenarios. Our extensive experiments demonstrate that NS-IR not only achieves superior zero-shot retrieval performance on web search and low-resource retrieval tasks, but also performs better on negative-constraint queries. Our scource code and dataset are available at https://github.com/xgl-git/NS-IR-main.

Paper Structure

This paper contains 27 sections, 12 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: An illustration of BGE-based retrieval. The word marked in green is the co-occurrence word between the query and documents.
  • Figure 2: A retrieval example of Google search engine.
  • Figure 3: The pipeline of our proposed NS-IR. Dashed arrows represent the retrieval stage. In the figure, only one document d in top-K documents is encoded, but actually, all top-K documents are encoded together.
  • Figure 4: An example of query embedding visualization from TREC-COVID (better viewed in color): What are the observed mutations in the SARS-CoV-2 genome and how often do the mutations occur?
  • Figure 5: An example of query embedding visualization from NegConstraint (better viewed in color): What are the similarities between Ginsberg's works (excluding 'Howl') and Poe's works (excluding 'The Raven')?
  • ...and 2 more figures