CAPTAIN at COLIEE 2023: Efficient Methods for Legal Information Retrieval and Entailment Tasks
Chau Nguyen, Phuong Nguyen, Thanh Tran, Dat Nguyen, An Trieu, Tin Pham, Anh Dang, Le-Minh Nguyen
TL;DR
This paper addresses automatic processing of legal texts in COLIEE 2023 by presenting CAPTAIN's multi-task pipeline for Task 2 (case law entailment), Task 3 (statute law retrieval), and Task 4 (legal textual entailment). It combines MonoT5-based fine-tuning with hard negative mining and ensembling for Task 2, leverages data-diverse sub-models and ensemble fusion for Task 3, and deploys three Task 4 strategies including online data augmentation, condition-statement extraction, and SVM ensembles. The results show state-of-the-art performance in Task 2 and Task 3, along with competitive outcomes in Task 4, demonstrating robust, domain-specific techniques for legal information retrieval and reasoning. The work highlights practical contributions, including public code and insights into data-filtering, model ensembling, and SRL-based reasoning that can impact real-world legal NLP applications.
Abstract
The Competition on Legal Information Extraction/Entailment (COLIEE) is held annually to encourage advancements in the automatic processing of legal texts. Processing legal documents is challenging due to the intricate structure and meaning of legal language. In this paper, we outline our strategies for tackling Task 2, Task 3, and Task 4 in the COLIEE 2023 competition. Our approach involved utilizing appropriate state-of-the-art deep learning methods, designing methods based on domain characteristics observation, and applying meticulous engineering practices and methodologies to the competition. As a result, our performance in these tasks has been outstanding, with first places in Task 2 and Task 3, and promising results in Task 4. Our source code is available at https://github.com/Nguyen2015/CAPTAIN-COLIEE2023/tree/coliee2023.
