FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge
Zunran Wang, Zhonghua Li, Wei Shen, Qi Ye, Liqiang Nie
TL;DR
FecTek tackles two key gaps in lexicon-based retrieval: enriching term weight with feature-context representations and introducing term-level knowledge guidance. It introduces the Feature Context Module (FCM) and Term-level Knowledge Guidance Module (TKGM) atop a BERT backbone, combining text-level contrastive learning with term-level supervision to produce improved term weights stored in an inverted index. Empirical results on MS MARCO show state-of-the-art performance for lexicon-based retrieval (MRR@10 up to 38.2 without distillation and up to 39.2 with distillation), outperforming several dense-vector retrievers. The approach demonstrates robust gains and offers practical benefits when integrated with reranker-based distillation, highlighting the value of combining feature-context and term-level knowledge in lexical retrieval pipelines.
Abstract
Lexicon-based retrieval has gained siginificant popularity in text retrieval due to its efficient and robust performance. To further enhance performance of lexicon-based retrieval, researchers have been diligently incorporating state-of-the-art methodologies like Neural retrieval and text-level contrastive learning approaches. Nonetheless, despite the promising outcomes, current lexicon-based retrieval methods have received limited attention in exploring the potential benefits of feature context representations and term-level knowledge guidance. In this paper, we introduce an innovative method by introducing FEature Context and TErm-level Knowledge modules(FecTek). To effectively enrich the feature context representations of term weight, the Feature Context Module (FCM) is introduced, which leverages the power of BERT's representation to determine dynamic weights for each element in the embedding. Additionally, we develop a term-level knowledge guidance module (TKGM) for effectively utilizing term-level knowledge to intelligently guide the modeling process of term weight. Evaluation of the proposed method on MS Marco benchmark demonstrates its superiority over the previous state-of-the-art approaches.
