Enhancing Criminal Case Matching through Diverse Legal Factors
Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang
TL;DR
This work addresses criminal case matching by introducing Diverse Legal Factor-enhanced CCM (DLF-CCM), which leverages three diverse legal factors—ARF, CRF, and TRF—to improve cross-case relevance beyond traditional instance-level semantics. It employs a two-stage framework: Judgment-Driven Pre-Training to learn LF representations from a large legal judgment prediction dataset, and a subsequent LF De-Redundancy module that splits LFs into shared and exclusive components, followed by an Entropy-Weighted Fusion that adaptively fuses LF-specific relevance using per-LF classifiers and entropy-based weights. The model is trained with losses for LF prediction ($L_{pt}$), LF exclusivity ($L_{ex}$), and LF sharing ($L_{sh}$) alongside the main matching loss ($L_{mat}$), regularized by hyperparameters $\lambda_1$ and $\lambda_2$, and evaluated on LeCaRD and CAIL. Experiments demonstrate significant improvements over strong baselines, with ablations confirming the contributions of LF de-redundancy and entropy-weighted fusion; qualitative analyses (t-SNE and fusion weights) highlight how shared LFs dominate confident predictions. The approach advances practical CCM by incorporating diverse, learned legal factors and principled fusion to achieve more accurate matching; code is provided at the project URL.
Abstract
Criminal case matching endeavors to determine the relevance between different criminal cases. Conventional methods predict the relevance solely based on instance-level semantic features and neglect the diverse legal factors (LFs), which are associated with diverse court judgments. Consequently, comprehensively representing a criminal case remains a challenge for these approaches. Moreover, extracting and utilizing these LFs for criminal case matching face two challenges: (1) the manual annotations of LFs rely heavily on specialized legal knowledge; (2) overlaps among LFs may potentially harm the model's performance. In this paper, we propose a two-stage framework named Diverse Legal Factor-enhanced Criminal Case Matching (DLF-CCM). Firstly, DLF-CCM employs a multi-task learning framework to pre-train an LF extraction network on a large-scale legal judgment prediction dataset. In stage two, DLF-CCM introduces an LF de-redundancy module to learn shared LF and exclusive LFs. Moreover, an entropy-weighted fusion strategy is introduced to dynamically fuse the multiple relevance generated by all LFs. Experimental results validate the effectiveness of DLF-CCM and show its significant improvements over competitive baselines. Code: https://github.com/jiezhao6/DLF-CCM.
