Combining Language and Graph Models for Semi-structured Information Extraction on the Web

Zhi Hong; Kyle Chard; Ian Foster

Combining Language and Graph Models for Semi-structured Information Extraction on the Web

Zhi Hong, Kyle Chard, Ian Foster

TL;DR

GraphScholarBERT is presented, an open-domain information extraction method based on a joint graph and language model structure that can generalize to previously unseen domains without additional data or training and produces only clean extraction results matched to the search keyword.

Abstract

Relation extraction is an efficient way of mining the extraordinary wealth of human knowledge on the Web. Existing methods rely on domain-specific training data or produce noisy outputs. We focus here on extracting targeted relations from semi-structured web pages given only a short description of the relation. We present GraphScholarBERT, an open-domain information extraction method based on a joint graph and language model structure. GraphScholarBERT can generalize to previously unseen domains without additional data or training and produces only clean extraction results matched to the search keyword. Experiments show that GraphScholarBERT can improve extraction F1 scores by as much as 34.8\% compared to previous work in a zero-shot domain and zero-shot website setting.

Combining Language and Graph Models for Semi-structured Information Extraction on the Web

TL;DR

Abstract

Paper Structure (12 sections, 2 figures, 7 tables)

This paper contains 12 sections, 2 figures, 7 tables.

Introduction
Problem Definition
Related Work
Methodology
Graph Model
The Language Model
Experimental Evaluation
Intra-vertical Extraction
Inter-vertical Extraction
Error Analysis
Ablation
Summary

Figures (2)

Figure 1: The GraphScholarBERT model architecture
Figure 2: Graph representation of a webpage. DOM node attributes propagate bi-directionally through the black edges but not the dashed orange edges ("virtual edges"), which are only used for classification.

Combining Language and Graph Models for Semi-structured Information Extraction on the Web

TL;DR

Abstract

Combining Language and Graph Models for Semi-structured Information Extraction on the Web

Authors

TL;DR

Abstract

Table of Contents

Figures (2)