Knowledge Augmented Entity and Relation Extraction for Legal Documents with Hypergraph Neural Network
Binglin Wu, Xianneng Li
TL;DR
This work tackles entity and relation extraction in legal documents by integrating domain-specific knowledge into a hypergraph-based joint reasoning framework. The proposed Legal-KAHRE architecture combines a candidate span generator, a knowledge-augmented encoder with a drug-domain dictionary, and a domain-tailored hypergraph that encodes joint crimes and combined punishments to enable higher-order inference. Empirical results on the CAIL2022 information extraction dataset show that Legal-KAHRE outperforms strong baselines across encoders, validating the value of knowledge augmentation and domain-aware hypergraph structures for legal information extraction. The approach advances the construction of structured judicial knowledge graphs and supports downstream legal AI tasks with improved accuracy and explainability.
Abstract
With the continuous progress of digitization in Chinese judicial institutions, a substantial amount of electronic legal document information has been accumulated. To unlock its potential value, entity and relation extraction for legal documents has emerged as a crucial task. However, existing methods often lack domain-specific knowledge and fail to account for the unique characteristics of the judicial domain. In this paper, we propose an entity and relation extraction algorithm based on hypergraph neural network (Legal-KAHRE) for drug-related judgment documents. Firstly, we design a candidate span generator based on neighbor-oriented packing strategy and biaffine mechanism, which identifies spans likely to contain entities. Secondly, we construct a legal dictionary with judicial domain knowledge and integrate it into text encoding representation using multi-head attention. Additionally, we incorporate domain-specific cases like joint crimes and combined punishment for multiple crimes into the hypergraph structure design. Finally, we employ a hypergraph neural network for higher-order inference via message passing. Experimental results on the CAIL2022 information extraction dataset demonstrate that our method significantly outperforms existing baseline models.
