Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

Gilchan Park; Sean McCorkle; Carlos Soto; Ian Blaby; Shinjae Yoo

Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

Gilchan Park, Sean McCorkle, Carlos Soto, Ian Blaby, Shinjae Yoo

TL;DR

This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels and a Transformer-based deep learning method that exploits entities’ relational context information for relation representation to improve relation classification performance.

Abstract

Because protein-protein interactions (PPIs) are crucial to understand living systems, harvesting these data is essential to probe disease development and discern gene/protein functions and biological processes. Some curated datasets contain PPI data derived from the literature and other sources (e.g., IntAct, BioGrid, DIP, and HPRD). However, they are far from exhaustive, and their maintenance is a labor-intensive process. On the other hand, machine learning methods to automate PPI knowledge extraction from the scientific literature have been limited by a shortage of appropriate annotated data. This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels and a Transformer-based deep learning method that exploits entities' relational context information for relation representation to improve relation classification performance. The model's performance is evaluated on four widely studied biomedical relation extraction datasets, as well as this work's target PPI datasets, to observe the effectiveness of the representation to relation extraction tasks in various data. Results show the model outperforms prior state-of-the-art models. The code and data are available at: https://github.com/BNLNLP/PPI-Relation-Extraction

Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

TL;DR

Abstract

Paper Structure (25 sections, 3 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 25 sections, 3 equations, 3 figures, 8 tables, 1 algorithm.

Introduction
Related Work
PPI corpora
PPI extraction methods
Additional PPI curation
Problems discovered during curation
Bias due to restricted biological focus for each set
Differences in notion of the definition of an interaction
Confusion over PPI-negative annotations
Interaction Type Annotation
Methodology
Relation Representation augmented with Attention-based Context Information
Model Architecture
Experimental Setup
Datasets
...and 10 more sections

Figures (3)

Figure 1: The relation representation consists of entity start markers and the max-pooled of relational context, which is a series of tokens chosen by attention probability of the entities. The relation representation based on mention pooling is depicted in Appendix \ref{['sec:appendix_rel_rep_mention_pooling']}. $\oplus$ denotes element-wise addition. The example sentence is Absence of alpha-syntrophin leads to structurally aberrant neuromuscular synapses deficient in utrophin. (Source: BioInfer corpus).
Figure 2: The relation representation consists of the max-pooled of two entity contextualized embeddings and the max-pooled of relational context, which is a series of tokens chosen by attention probability of the entities. $\oplus$ denotes element-wise addition. The example sentence is Absence of alpha-syntrophin leads to structurally aberrant neuromuscular synapses deficient in utrophin. (Source: BioInfer corpus).
Figure :

Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

TL;DR

Abstract

Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

Authors

TL;DR

Abstract

Table of Contents

Figures (3)