Table of Contents
Fetching ...

From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction

Ang Li, Qiangchao Chen, Yiquan Wu, Ming Cai, Xiang Zhou, Fei Wu, Kun Kuang

TL;DR

This paper introduces a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge’s reasoning process.

Abstract

Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing charges. Constituent elements are fundamental behaviors underlying criminal punishment and have subtle distinctions among charges. In this paper, we introduce a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge's reasoning process. Specifically, we first construct a legal knowledge graph containing constituent elements to help select keywords for each charge, forming a word bag. Subsequently, to guide the model's attention towards the differentiating information for each charge within the context, we expand the attention mechanism and introduce a new loss function with attention supervision through words in the word bag. We construct the confusing charges dataset from real-world judicial documents. Experiments demonstrate the effectiveness of our method, especially in maintaining exceptional performance in imbalanced label distributions.

From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction

TL;DR

This paper introduces a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge’s reasoning process.

Abstract

Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing charges. Constituent elements are fundamental behaviors underlying criminal punishment and have subtle distinctions among charges. In this paper, we introduce a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge's reasoning process. Specifically, we first construct a legal knowledge graph containing constituent elements to help select keywords for each charge, forming a word bag. Subsequently, to guide the model's attention towards the differentiating information for each charge within the context, we expand the attention mechanism and introduce a new loss function with attention supervision through words in the word bag. We construct the confusing charges dataset from real-world judicial documents. Experiments demonstrate the effectiveness of our method, especially in maintaining exceptional performance in imbalanced label distributions.
Paper Structure (34 sections, 11 equations, 5 figures, 5 tables)

This paper contains 34 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Confusing charges in real legal cases. Red words indicate the words all charges share, blue words indicate the words some charges share, and the yellow highlighted words indicate the unique words of each charge.
  • Figure 2: Structure of Our Model. The charge predictor uses LSTM to encode fact descriptions, employs a multi-attention mechanism for label-independent attention scores, and derives probability distributions for each label. The word bag former transforms expert knowledge graphs into prerequisites, selecting genuine keywords from statistical data to create a word bag. The Multi-attention supervisor assumes high attention values for label-related keywords, masking out irrelevant ones to guide the attention mechanism. Here, $L_c$ is the loss of classification, and $L_s$ is the loss associated with attention supervision.
  • Figure 3: Construction and utilization of expert knowledge graphs.
  • Figure 4: Attention distribution from different models for a Snatch case. "SV" stands for using attention supervision, while "MSV" stands for using multi-attention supervision. They are both implemented on LSTM.
  • Figure 5: Model performance by the coefficient $\lambda$ for attention supervision.