Automatic Context Pattern Generation for Entity Set Expansion

Yinghui Li; Shulin Huang; Xinwei Zhang; Qingyu Zhou; Yangning Li; Ruiyang Liu; Yunbo Cao; Hai-Tao Zheng; Ying Shen

Automatic Context Pattern Generation for Entity Set Expansion

Yinghui Li, Shulin Huang, Xinwei Zhang, Qingyu Zhou, Yangning Li, Ruiyang Liu, Yunbo Cao, Hai-Tao Zheng, Ying Shen

TL;DR

The GAPA is proposed, a novel ESE framework that leverages the aforementioned GenerAted PAtterns to expand target entities and devise a context pattern generation module that utilizes autoregressive language models (e.g., GPT-2) to automatically generate high-quality context patterns for entities.

Abstract

Entity Set Expansion (ESE) is a valuable task that aims to find entities of the target semantic class described by given seed entities. Various Natural Language Processing (NLP) and Information Retrieval (IR) downstream applications have benefited from ESE due to its ability to discover knowledge. Although existing corpus-based ESE methods have achieved great progress, they still rely on corpora with high-quality entity information annotated, because most of them need to obtain the context patterns through the position of the entity in a sentence. Therefore, the quality of the given corpora and their entity annotation has become the bottleneck that limits the performance of such methods. To overcome this dilemma and make the ESE models free from the dependence on entity annotation, our work aims to explore a new ESE paradigm, namely corpus-independent ESE. Specifically, we devise a context pattern generation module that utilizes autoregressive language models (e.g., GPT-2) to automatically generate high-quality context patterns for entities. In addition, we propose the GAPA, a novel ESE framework that leverages the aforementioned GenerAted PAtterns to expand target entities. Extensive experiments and detailed analyses on three widely used datasets demonstrate the effectiveness of our method. All the codes of our experiments are available at https://github.com/geekjuruo/GAPA.

Automatic Context Pattern Generation for Entity Set Expansion

TL;DR

Abstract

Paper Structure (25 sections, 10 equations, 5 figures, 7 tables)

This paper contains 25 sections, 10 equations, 5 figures, 7 tables.

Introduction
Related Work
Entity Set Expansion
Autoregressive Language Models
Methodology
Problem Formulation
Supervision Signal Enhancement
Context Pattern Generation
Generated Patterns Guided Expansion
Summary of Methodology
Experiments
Datasets
Compared Methods
Evaluation Metric
Implementation Details
...and 10 more sections

Figures (5)

Figure 1: An example showing the expansion process of the traditional corpus-based ESE methods.
Figure 2: Overview of our proposed GAPA framework. We update the initial seed/candidate sets according to the similarity of entity representations. Automatically generate the prev-text and next-text of entities respectively through two opposite GPT-2 models, and then concatenate them to get the context patterns. According to the similarity of context representations obtained by BERT, we iteratively select proper entities from the candidate set to add into the seed set, thus resulting in ideal target expansion results.
Figure 3: Running efficiency analysis. For every class, we report the running time consumed by the two models when expanding 50 entities.
Figure 4: Parameter sensitivity analysis of $\text{thr}_{u}$, $\text{thr}_{l}$ in GAPA. The MAP@50 of the state-of-the-art models on Wiki and APR are 0.926 and 0.960.
Figure 5: Parameter sensitivity analysis of $\text{m}$ in GAPA. The MAP@50 of the state-of-the-art models on Wiki and APR are 0.926 and 0.960.

Automatic Context Pattern Generation for Entity Set Expansion

TL;DR

Abstract

Automatic Context Pattern Generation for Entity Set Expansion

Authors

TL;DR

Abstract

Table of Contents

Figures (5)