Table of Contents
Fetching ...

Employing Label Models on ChatGPT Answers Improves Legal Text Entailment Performance

Chau Nguyen, Le-Minh Nguyen

TL;DR

The paper addresses legal text entailment by leveraging ChatGPT outputs on COLIEE 2022. It introduces a two-stage approach: prompt engineering to obtain provisional answers and weak supervision label models to consolidate these provisional predictions into final labels. Empirical results show that a Generative label model applied to 10 provisional answers with temperature 0.5 achieves 76.15% accuracy, an 8.26% improvement over the previous SOTA. Error analysis identifies four categories of ChatGPT reasoning failures, providing guidance for future refinements and applications to related legal NLP tasks.

Abstract

The objective of legal text entailment is to ascertain whether the assertions in a legal query logically follow from the information provided in one or multiple legal articles. ChatGPT, a large language model, is robust in many natural language processing tasks, including legal text entailment: when we set the temperature = 0 (the ChatGPT answers are deterministic) and prompt the model, it achieves 70.64% accuracy on COLIEE 2022 dataset, which outperforms the previous SOTA of 67.89%. On the other hand, if the temperature is larger than zero, ChatGPT answers are not deterministic, leading to inconsistent answers and fluctuating results. We propose to leverage label models (a fundamental component of weak supervision techniques) to integrate the provisional answers by ChatGPT into consolidated labels. By that way, we treat ChatGPT provisional answers as noisy predictions which can be consolidated by label models. The experimental results demonstrate that this approach can attain an accuracy of 76.15%, marking a significant improvement of 8.26% over the prior state-of-the-art benchmark. Additionally, we perform an analysis of the instances where ChatGPT produces incorrect answers, then we classify the errors, offering insights that could guide potential enhancements for future research endeavors.

Employing Label Models on ChatGPT Answers Improves Legal Text Entailment Performance

TL;DR

The paper addresses legal text entailment by leveraging ChatGPT outputs on COLIEE 2022. It introduces a two-stage approach: prompt engineering to obtain provisional answers and weak supervision label models to consolidate these provisional predictions into final labels. Empirical results show that a Generative label model applied to 10 provisional answers with temperature 0.5 achieves 76.15% accuracy, an 8.26% improvement over the previous SOTA. Error analysis identifies four categories of ChatGPT reasoning failures, providing guidance for future refinements and applications to related legal NLP tasks.

Abstract

The objective of legal text entailment is to ascertain whether the assertions in a legal query logically follow from the information provided in one or multiple legal articles. ChatGPT, a large language model, is robust in many natural language processing tasks, including legal text entailment: when we set the temperature = 0 (the ChatGPT answers are deterministic) and prompt the model, it achieves 70.64% accuracy on COLIEE 2022 dataset, which outperforms the previous SOTA of 67.89%. On the other hand, if the temperature is larger than zero, ChatGPT answers are not deterministic, leading to inconsistent answers and fluctuating results. We propose to leverage label models (a fundamental component of weak supervision techniques) to integrate the provisional answers by ChatGPT into consolidated labels. By that way, we treat ChatGPT provisional answers as noisy predictions which can be consolidated by label models. The experimental results demonstrate that this approach can attain an accuracy of 76.15%, marking a significant improvement of 8.26% over the prior state-of-the-art benchmark. Additionally, we perform an analysis of the instances where ChatGPT produces incorrect answers, then we classify the errors, offering insights that could guide potential enhancements for future research endeavors.
Paper Structure (10 sections, 2 figures, 4 tables)