Towards an In-Depth Comprehension of Case Relevance for Better Legal Retrieval
Haitao Li, You Chen, Zhekai Ge, Qingyao Ai, Yiqun Liu, Quan Zhou, Shuai Huo
TL;DR
Addressing how to model case relevance for legal retrieval, the paper combines lexical and semantic signals with learning-to-rank to improve document relevance judgments. The authors implement a pipeline with pre-processing, lexical models (BM25, QLD, BM25_ngram), semantic retrieval (SAILER, DELTA), and a LightGBM-based fusion, plus task-specific post-processing, evaluated on COLIEE2024 Task 1 and Task 3. They report first place in Task 1 and third place in Task 3, illustrating the effectiveness and robustness of the hybrid approach. The results suggest that integrating structural understanding of legal texts and dynamic filtering can advance practical legal-information retrieval systems, with future work aimed at incorporating richer legal knowledge.
Abstract
Legal retrieval techniques play an important role in preserving the fairness and equality of the judicial system. As an annually well-known international competition, COLIEE aims to advance the development of state-of-the-art retrieval models for legal texts. This paper elaborates on the methodology employed by the TQM team in COLIEE2024.Specifically, we explored various lexical matching and semantic retrieval models, with a focus on enhancing the understanding of case relevance. Additionally, we endeavor to integrate various features using the learning-to-rank technique. Furthermore, fine heuristic pre-processing and post-processing methods have been proposed to mitigate irrelevant information. Consequently, our methodology achieved remarkable performance in COLIEE2024, securing first place in Task 1 and third place in Task 3. We anticipate that our proposed approach can contribute valuable insights to the advancement of legal retrieval technology.
