Composable NLP Workflows for BERT-based Ranking and QA System
Gaurav Kumar, Murali Mohana Krishna Dandu
TL;DR
The paper addresses the challenge of building end-to-end NLP systems that span retrieval, ranking, and QA components by proposing a composable pipeline implemented in Forte. It integrates a BM25 full-ranking, a BERT-based re-ranker, and a QA model to perform retrieval-augmented QA, and evaluates this setup on MS-MARCO and a Covid-19 corpus, including a Covid-19 QA extension. Key contributions include a working end-to-end system, open-source Forte component additions, and practical examples, along with an analysis of latency vs. performance and domain adaptation. The findings demonstrate that re-ranking substantially improves ranking metrics, while QA performance is affected by upstream errors and domain characteristics, highlighting actionable guidance for real-world search and QA systems.
Abstract
There has been a lot of progress towards building NLP models that scale to multiple tasks. However, real-world systems contain multiple components and it is tedious to handle cross-task interaction with varying levels of text granularity. In this work, we built an end-to-end Ranking and Question-Answering (QA) system using Forte, a toolkit that makes composable NLP pipelines. We utilized state-of-the-art deep learning models such as BERT, RoBERTa in our pipeline, evaluated the performance on MS-MARCO and Covid-19 datasets using metrics such as BLUE, MRR, F1 and compared the results of ranking and QA systems with their corresponding benchmark results. The modular nature of our pipeline and low latency of reranker makes it easy to build complex NLP applications easily.
