MatSciRE: Leveraging Pointer Networks to Automate Entity and Relation Extraction for Material Science Knowledge-base Construction
Ankan Mullick, Akash Ghosh, G Sai Chaitanya, Samir Ghui, Tapas Nayak, Seung-Cheol Lee, Satadeep Bhattacharjee, Pawan Goyal
TL;DR
MatSciRE addresses the automatic extraction of material-science entities and relation triplets from literature to support knowledge-base construction, focusing on battery materials and five key relations. It introduces a pointer-network–based encoder–decoder that jointly retrieves (entity1, relation, entity2) triplets, trained on a distantly supervised battery corpus and validated against a gold-standard annotation. The study compares multiple encoders and two decoding strategies, finding that a Pointer Network with MatBERT yields the strongest macro F1 (around 0.913–0.915) and outperforms ChemDataExtractor by about 6%. The work demonstrates data-efficient relation extraction for materials science, provides a web API and open-source code, and enables scalable automatic construction of battery-related knowledge bases.
Abstract
Material science literature is a rich source of factual information about various categories of entities (like materials and compositions) and various relations between these entities, such as conductivity, voltage, etc. Automatically extracting this information to generate a material science knowledge base is a challenging task. In this paper, we propose MatSciRE (Material Science Relation Extractor), a Pointer Network-based encoder-decoder framework, to jointly extract entities and relations from material science articles as a triplet ($entity1, relation, entity2$). Specifically, we target the battery materials and identify five relations to work on - conductivity, coulombic efficiency, capacity, voltage, and energy. Our proposed approach achieved a much better F1-score (0.771) than a previous attempt using ChemDataExtractor (0.716). The overall graphical framework of MatSciRE is shown in Fig 1. The material information is extracted from material science literature in the form of entity-relation triplets using MatSciRE.
