Table of Contents
Fetching ...

Enhancing Criminal Case Matching through Diverse Legal Factors

Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang

TL;DR

This work addresses criminal case matching by introducing Diverse Legal Factor-enhanced CCM (DLF-CCM), which leverages three diverse legal factors—ARF, CRF, and TRF—to improve cross-case relevance beyond traditional instance-level semantics. It employs a two-stage framework: Judgment-Driven Pre-Training to learn LF representations from a large legal judgment prediction dataset, and a subsequent LF De-Redundancy module that splits LFs into shared and exclusive components, followed by an Entropy-Weighted Fusion that adaptively fuses LF-specific relevance using per-LF classifiers and entropy-based weights. The model is trained with losses for LF prediction ($L_{pt}$), LF exclusivity ($L_{ex}$), and LF sharing ($L_{sh}$) alongside the main matching loss ($L_{mat}$), regularized by hyperparameters $\lambda_1$ and $\lambda_2$, and evaluated on LeCaRD and CAIL. Experiments demonstrate significant improvements over strong baselines, with ablations confirming the contributions of LF de-redundancy and entropy-weighted fusion; qualitative analyses (t-SNE and fusion weights) highlight how shared LFs dominate confident predictions. The approach advances practical CCM by incorporating diverse, learned legal factors and principled fusion to achieve more accurate matching; code is provided at the project URL.

Abstract

Criminal case matching endeavors to determine the relevance between different criminal cases. Conventional methods predict the relevance solely based on instance-level semantic features and neglect the diverse legal factors (LFs), which are associated with diverse court judgments. Consequently, comprehensively representing a criminal case remains a challenge for these approaches. Moreover, extracting and utilizing these LFs for criminal case matching face two challenges: (1) the manual annotations of LFs rely heavily on specialized legal knowledge; (2) overlaps among LFs may potentially harm the model's performance. In this paper, we propose a two-stage framework named Diverse Legal Factor-enhanced Criminal Case Matching (DLF-CCM). Firstly, DLF-CCM employs a multi-task learning framework to pre-train an LF extraction network on a large-scale legal judgment prediction dataset. In stage two, DLF-CCM introduces an LF de-redundancy module to learn shared LF and exclusive LFs. Moreover, an entropy-weighted fusion strategy is introduced to dynamically fuse the multiple relevance generated by all LFs. Experimental results validate the effectiveness of DLF-CCM and show its significant improvements over competitive baselines. Code: https://github.com/jiezhao6/DLF-CCM.

Enhancing Criminal Case Matching through Diverse Legal Factors

TL;DR

This work addresses criminal case matching by introducing Diverse Legal Factor-enhanced CCM (DLF-CCM), which leverages three diverse legal factors—ARF, CRF, and TRF—to improve cross-case relevance beyond traditional instance-level semantics. It employs a two-stage framework: Judgment-Driven Pre-Training to learn LF representations from a large legal judgment prediction dataset, and a subsequent LF De-Redundancy module that splits LFs into shared and exclusive components, followed by an Entropy-Weighted Fusion that adaptively fuses LF-specific relevance using per-LF classifiers and entropy-based weights. The model is trained with losses for LF prediction (), LF exclusivity (), and LF sharing () alongside the main matching loss (), regularized by hyperparameters and , and evaluated on LeCaRD and CAIL. Experiments demonstrate significant improvements over strong baselines, with ablations confirming the contributions of LF de-redundancy and entropy-weighted fusion; qualitative analyses (t-SNE and fusion weights) highlight how shared LFs dominate confident predictions. The approach advances practical CCM by incorporating diverse, learned legal factors and principled fusion to achieve more accurate matching; code is provided at the project URL.

Abstract

Criminal case matching endeavors to determine the relevance between different criminal cases. Conventional methods predict the relevance solely based on instance-level semantic features and neglect the diverse legal factors (LFs), which are associated with diverse court judgments. Consequently, comprehensively representing a criminal case remains a challenge for these approaches. Moreover, extracting and utilizing these LFs for criminal case matching face two challenges: (1) the manual annotations of LFs rely heavily on specialized legal knowledge; (2) overlaps among LFs may potentially harm the model's performance. In this paper, we propose a two-stage framework named Diverse Legal Factor-enhanced Criminal Case Matching (DLF-CCM). Firstly, DLF-CCM employs a multi-task learning framework to pre-train an LF extraction network on a large-scale legal judgment prediction dataset. In stage two, DLF-CCM introduces an LF de-redundancy module to learn shared LF and exclusive LFs. Moreover, an entropy-weighted fusion strategy is introduced to dynamically fuse the multiple relevance generated by all LFs. Experimental results validate the effectiveness of DLF-CCM and show its significant improvements over competitive baselines. Code: https://github.com/jiezhao6/DLF-CCM.
Paper Structure (13 sections, 6 equations, 4 figures, 1 table)

This paper contains 13 sections, 6 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: A criminal case (translated) with diverse LFs. The highlighted segments ①, ④, and ⑤ represent ARF; the ①, ②, and ③ represent CRF; the ②, ③, ④ and ⑤ represent TRF.
  • Figure 2: The framework of DLF-CCM.
  • Figure 3: The t-SNE plot of LFs. "Blue" represents source cases and "red" represents target cases. The first and second rows represent case pairs with labels of 3 and 0, respectively.
  • Figure 4: The box plot of fusion weights.