FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions

Qiming Wang; Raul Castro Fernandez

FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions

Qiming Wang, Raul Castro Fernandez

TL;DR

This work tackles the challenge of extracting structured data from vast unstructured text by introducing Relation Coherence, a model that uses relational structure to augment open-domain question answering. Coupled with FabricQA-Extractor, it forms an end-to-end system that processes millions of documents with sub-second latency, combining offline chunking/indexing, passage ranking, and answer ranking with a backward-search coherence mechanism. Evaluations on QA-ZRE and BioNLP demonstrate improvements over strong baselines and show the approach works across domains with limited training data, highlighting the importance of relational consistency in information extraction. The proposed framework offers a scalable, transparent solution to populate missing table cells from large corpora, enabling practical large-scale data structuring for data management tasks.

Abstract

Reading comprehension models answer questions posed in natural language when provided with a short passage of text. They present an opportunity to address a long-standing challenge in data management: the extraction of structured data from unstructured text. Consequently, several approaches are using these models to perform information extraction. However, these modern approaches leave an opportunity behind because they do not exploit the relational structure of the target extraction table. In this paper, we introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality. We incorporate the Relation Coherence model as part of FabricQA-Extractor, an end-to-end system we built from scratch to conduct large scale extraction tasks over millions of documents. We demonstrate on two datasets with millions of passages that Relation Coherence boosts extraction performance and evaluate FabricQA-Extractor on large scale datasets.

FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions

TL;DR

Abstract

Paper Structure (30 sections, 7 equations, 6 figures, 5 tables)

This paper contains 30 sections, 7 equations, 6 figures, 5 tables.

Introduction
Preliminaries
Problem Setup
Solution Landscape
Information Extraction
Reading Comprehension
Open Domain Question Answering
Summary and Contribution
Relation Coherence Model
Backward Searching
Coherence Score
OpenQA and Coherence Ensemble
FabricQA-Extractor Architecture
System Overview and Design Goals
Query Lifecycle
...and 15 more sections

Figures (6)

Figure 1: FabricQA-Extractor user interface.
Figure 2: FabricQA-Extractor's architecture overview
Figure 3: Schematic of the Answer ranker model
Figure 4: FabricQA-Ensemble relative improvement over FabricQA (top). FabricQA-Extractor improvement is shown at the bottom.
Figure 5: DrQA-Coherence relative improvement over DrQA-Adapted
...and 1 more figures

FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions

TL;DR

Abstract

FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions

Authors

TL;DR

Abstract

Table of Contents

Figures (6)