SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing
Pengjie Liu, Wang Zhang, Yulong Ding, Xuefeng Zhang, Shuang-Hua Yang
TL;DR
The paper tackles Legal Judgment Prediction (LJP), focusing on accurately distinguishing between confusable criminal charges. It introduces SEMDR, a semantic-aware dual encoder with a three-level legal clue tracing mechanism—Lexicon-Tracing, Sentence Representation Learning (contrastive training with dropout), and Multi-Fact Reasoning via a case-enhancement graph—to enable fine-grained reasoning between criminal facts and instrument labels. The model learns robust criminal-fact representations and propagates clues through a graph attention network to refine instrument-label embeddings, with the prediction probability defined as $P(L_{I/C/A}|H^{F}) = \mathrm{softmax}(\mathrm{sim}(H^{F}, \widetilde{H}^{L}))$. Empirical results on CAIL2018 show SEMDR achieving state-of-the-art performance, especially in low-frequency and confusing charges, with ablations confirming the dominant contribution of graph reasoning and the synergistic effect of all three clue-tracing components. The work advances LJP by reducing uncertainty and enabling more uniform, discriminative representations, with practical impact for robust legal judgments and potential few-shot learning benefits.
Abstract
Legal Judgment Prediction (LJP) aims to form legal judgments based on the criminal fact description. However, researchers struggle to classify confusing criminal cases, such as robbery and theft, which requires LJP models to distinguish the nuances between similar crimes. Existing methods usually design handcrafted features to pick up necessary semantic legal clues to make more accurate legal judgment predictions. In this paper, we propose a Semantic-Aware Dual Encoder Model (SEMDR), which designs a novel legal clue tracing mechanism to conduct fine-grained semantic reasoning between criminal facts and instruments. Our legal clue tracing mechanism is built from three reasoning levels: 1) Lexicon-Tracing, which aims to extract criminal facts from criminal descriptions; 2) Sentence Representation Learning, which contrastively trains language models to better represent confusing criminal facts; 3) Multi-Fact Reasoning, which builds a reasons graph to propagate semantic clues among fact nodes to capture the subtle difference among criminal facts. Our legal clue tracing mechanism helps SEMDR achieve state-of-the-art on the CAIL2018 dataset and shows its advance in few-shot scenarios. Our experiments show that SEMDR has a strong ability to learn more uniform and distinguished representations for criminal facts, which helps to make more accurate predictions on confusing criminal cases and reduces the model uncertainty during making judgments. All codes will be released via GitHub.
