FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

Zunran Wang; Zhonghua Li; Wei Shen; Qi Ye; Liqiang Nie

FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

Zunran Wang, Zhonghua Li, Wei Shen, Qi Ye, Liqiang Nie

TL;DR

FecTek tackles two key gaps in lexicon-based retrieval: enriching term weight with feature-context representations and introducing term-level knowledge guidance. It introduces the Feature Context Module (FCM) and Term-level Knowledge Guidance Module (TKGM) atop a BERT backbone, combining text-level contrastive learning with term-level supervision to produce improved term weights stored in an inverted index. Empirical results on MS MARCO show state-of-the-art performance for lexicon-based retrieval (MRR@10 up to 38.2 without distillation and up to 39.2 with distillation), outperforming several dense-vector retrievers. The approach demonstrates robust gains and offers practical benefits when integrated with reranker-based distillation, highlighting the value of combining feature-context and term-level knowledge in lexical retrieval pipelines.

Abstract

Lexicon-based retrieval has gained siginificant popularity in text retrieval due to its efficient and robust performance. To further enhance performance of lexicon-based retrieval, researchers have been diligently incorporating state-of-the-art methodologies like Neural retrieval and text-level contrastive learning approaches. Nonetheless, despite the promising outcomes, current lexicon-based retrieval methods have received limited attention in exploring the potential benefits of feature context representations and term-level knowledge guidance. In this paper, we introduce an innovative method by introducing FEature Context and TErm-level Knowledge modules(FecTek). To effectively enrich the feature context representations of term weight, the Feature Context Module (FCM) is introduced, which leverages the power of BERT's representation to determine dynamic weights for each element in the embedding. Additionally, we develop a term-level knowledge guidance module (TKGM) for effectively utilizing term-level knowledge to intelligently guide the modeling process of term weight. Evaluation of the proposed method on MS Marco benchmark demonstrates its superiority over the previous state-of-the-art approaches.

FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

TL;DR

Abstract

Paper Structure (21 sections, 8 equations, 3 figures, 3 tables)

This paper contains 21 sections, 8 equations, 3 figures, 3 tables.

Introduction
Related Work
Dense-vector Retriever
Contextual Information for Lexicon-base Retriever
Term-level Label Assignment for Lexicon-base Retriever
Reranker-taught Retriever
Methodology
Network Architecture
FCM
TKGM
Loss
Experiments
Implementation Details
Datasets & Metrics
Experimental Setups
...and 6 more sections

Figures (3)

Figure 1: Lexicon-based retrieval learning architectures. (a) The mainstream architectures. (b) Our FecTek architectures: introducing feature context representations and term-level knowledge guidance.
Figure 2: The architecture of our FecTek. FecTek leverages BERT to extract spatial context representations of each token. It then incorporates two branches to learn text-level and term-level knowledge respectively. The text-level branch consists of the FCM and the Projector module, while the term-level branch includes the Indicator module and an additional Projector module. Only during the training process does the term-level branch occur.
Figure 3: The feature context module. $\otimes$ denotes the point-wise multiplication.

FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

TL;DR

Abstract

FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

Authors

TL;DR

Abstract

Table of Contents

Figures (3)