Table of Contents
Fetching ...

FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

Zunran Wang, Zhonghua Li, Wei Shen, Qi Ye, Liqiang Nie

TL;DR

FecTek tackles two key gaps in lexicon-based retrieval: enriching term weight with feature-context representations and introducing term-level knowledge guidance. It introduces the Feature Context Module (FCM) and Term-level Knowledge Guidance Module (TKGM) atop a BERT backbone, combining text-level contrastive learning with term-level supervision to produce improved term weights stored in an inverted index. Empirical results on MS MARCO show state-of-the-art performance for lexicon-based retrieval (MRR@10 up to 38.2 without distillation and up to 39.2 with distillation), outperforming several dense-vector retrievers. The approach demonstrates robust gains and offers practical benefits when integrated with reranker-based distillation, highlighting the value of combining feature-context and term-level knowledge in lexical retrieval pipelines.

Abstract

Lexicon-based retrieval has gained siginificant popularity in text retrieval due to its efficient and robust performance. To further enhance performance of lexicon-based retrieval, researchers have been diligently incorporating state-of-the-art methodologies like Neural retrieval and text-level contrastive learning approaches. Nonetheless, despite the promising outcomes, current lexicon-based retrieval methods have received limited attention in exploring the potential benefits of feature context representations and term-level knowledge guidance. In this paper, we introduce an innovative method by introducing FEature Context and TErm-level Knowledge modules(FecTek). To effectively enrich the feature context representations of term weight, the Feature Context Module (FCM) is introduced, which leverages the power of BERT's representation to determine dynamic weights for each element in the embedding. Additionally, we develop a term-level knowledge guidance module (TKGM) for effectively utilizing term-level knowledge to intelligently guide the modeling process of term weight. Evaluation of the proposed method on MS Marco benchmark demonstrates its superiority over the previous state-of-the-art approaches.

FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

TL;DR

FecTek tackles two key gaps in lexicon-based retrieval: enriching term weight with feature-context representations and introducing term-level knowledge guidance. It introduces the Feature Context Module (FCM) and Term-level Knowledge Guidance Module (TKGM) atop a BERT backbone, combining text-level contrastive learning with term-level supervision to produce improved term weights stored in an inverted index. Empirical results on MS MARCO show state-of-the-art performance for lexicon-based retrieval (MRR@10 up to 38.2 without distillation and up to 39.2 with distillation), outperforming several dense-vector retrievers. The approach demonstrates robust gains and offers practical benefits when integrated with reranker-based distillation, highlighting the value of combining feature-context and term-level knowledge in lexical retrieval pipelines.

Abstract

Lexicon-based retrieval has gained siginificant popularity in text retrieval due to its efficient and robust performance. To further enhance performance of lexicon-based retrieval, researchers have been diligently incorporating state-of-the-art methodologies like Neural retrieval and text-level contrastive learning approaches. Nonetheless, despite the promising outcomes, current lexicon-based retrieval methods have received limited attention in exploring the potential benefits of feature context representations and term-level knowledge guidance. In this paper, we introduce an innovative method by introducing FEature Context and TErm-level Knowledge modules(FecTek). To effectively enrich the feature context representations of term weight, the Feature Context Module (FCM) is introduced, which leverages the power of BERT's representation to determine dynamic weights for each element in the embedding. Additionally, we develop a term-level knowledge guidance module (TKGM) for effectively utilizing term-level knowledge to intelligently guide the modeling process of term weight. Evaluation of the proposed method on MS Marco benchmark demonstrates its superiority over the previous state-of-the-art approaches.
Paper Structure (21 sections, 8 equations, 3 figures, 3 tables)

This paper contains 21 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Lexicon-based retrieval learning architectures. (a) The mainstream architectures. (b) Our FecTek architectures: introducing feature context representations and term-level knowledge guidance.
  • Figure 2: The architecture of our FecTek. FecTek leverages BERT to extract spatial context representations of each token. It then incorporates two branches to learn text-level and term-level knowledge respectively. The text-level branch consists of the FCM and the Projector module, while the term-level branch includes the Indicator module and an additional Projector module. Only during the training process does the term-level branch occur.
  • Figure 3: The feature context module. $\otimes$ denotes the point-wise multiplication.