Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Jiajia Wang; Jimmy X. Huang; Xinhui Tu; Junmei Wang; Angela J. Huang; Md Tahmid Rahman Laskar; Amran Bhuiyan

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Jiajia Wang, Jimmy X. Huang, Xinhui Tu, Junmei Wang, Angela J. Huang, Md Tahmid Rahman Laskar, Amran Bhuiyan

TL;DR

A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR and highlights the advantages of employing encoder-based BERT models in contrast to recent large language models like ChatGPT, which are decoder-based and demand extensive computational resources.

Abstract

Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliver state-of-the-art performance across various NLP tasks. This has inspired researchers and practitioners to apply BERT to practical problems, such as information retrieval (IR). A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR can thus be useful for academia and the industry. In light of this, we revisit a variety of BERT-based methods in this survey, cover a wide range of techniques of IR, and group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We also provide links to resources, including datasets and toolkits, for BERT-based IR systems. A key highlight of our survey is the comparison between BERT's encoder-based models and the latest generative Large Language Models (LLMs), such as ChatGPT, which rely on decoders. Despite the popularity of LLMs, we find that for specific tasks, finely tuned BERT encoders still outperform, and at a lower deployment cost. Finally, we summarize the comprehensive outcomes of the survey and suggest directions for future research in the area.

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

TL;DR

Abstract

Paper Structure (27 sections, 12 figures, 4 tables)

This paper contains 27 sections, 12 figures, 4 tables.

Introduction and Motivation
Background
Traditional Retrieval Models
Neural Ranking Models
Pretrained Language Models
Improvements to and Extensions of Pretrained Language Models
BERT for Ad-hoc IR and Challenges
Preliminary Overview of BERT
Utilizing BERT for Handling Long Documents
Utilizing BERT for Integrating Semantic Information
$\textbf{Aggregation Methods for Long Documents}$
$\textbf{Ranking Strategies via BERT}$
$\textbf{Weak Supervision via BERT}$
$\textbf{Extended Analysis of Comparisons and Experiments}$
BERT for Balancing Effectiveness and Efficiency
...and 12 more sections

Figures (12)

Figure 1: The pretraining model architectures of BERT, OpenAI GPT, and ELMo. BERT demonstrates a significant bidirectional nature, OpenAI’s GPT exhibits a unidirectional characteristic, while ELMo uses a shallow bidirectional approach.
Figure 2: The input representation of BERT consists of three parts: token embeddings, segment embeddings, and position embeddings. The special token [CLS] is always the first token in every sequence, and the special token [SEP] separates two sentences in it Devl
Figure 3: Overview of the architecture of KeyBLD. LiM. The KeyBLD model comprises four key components: block segmentation, block selection, query-blocks representation, and document ranking. The block segmentation phase divides documents into smaller blocks, while traditional IR models assign relevance scores to each block. The query-blocks representation step concatenates the query with the most relevant blocks, and this is the input to the BERT model for document re-ranking and improved retrieval performance.
Figure 4: The architecture of Birch showcases seamless integration between Python and the Java Virtual Machine (JVM), thus enabling the efficient utilization of neural networks like BERT. The main code-entry point implemented in Python leverages the capabilities of the JVM to facilitate document retrieval through the Lucene search library. The top k candidates retrieved from JVM’s Anserini are then input to BERT for a secondary ranking process Yil.
Figure 5: Comparison of GloVe word embeddings, ELMo representations (layer 2), and fine-tuned BERT representations (layer 5) for relevant and irrelevant documents in relation to a given query Sea. Lighter colors represent higher similarity scores.
...and 7 more figures

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

TL;DR

Abstract

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (12)