AI and the Law: Evaluating ChatGPT's Performance in Legal Classification

Pawel Weichbroth

AI and the Law: Evaluating ChatGPT's Performance in Legal Classification

Pawel Weichbroth

TL;DR

This study addresses whether ChatGPT can accurately classify evidence under the Polish Penal Code using Polish-language notes. It builds a self-constructed, balanced dataset of 268 text notes (134 positive, 134 negative) and evaluates classification performance with a confusion-matrix framework, complemented by qualitative checks of the legal basis. The results indicate perfect accuracy across all cases, with correct legal reasoning and applicable paragraphs for each classification, suggesting strong potential for AI-assisted legal analysis in Polish. The work also discusses limitations, notably generalizability due to crime-type scope and dataset size, and points to future work including larger datasets, personalization, and multi-modal evidence analysis.

Abstract

The use of ChatGPT to analyze and classify evidence in criminal proceedings has been a topic of ongoing discussion. However, to the best of our knowledge, this issue has not been studied in the context of the Polish language. This study addresses this research gap by evaluating the effectiveness of ChatGPT in classifying legal cases under the Polish Penal Code. The results show excellent binary classification accuracy, with all positive and negative cases correctly categorized. In addition, a qualitative evaluation confirms that the legal basis provided for each case, along with the relevant legal content, was appropriate. The results obtained suggest that ChatGPT can effectively analyze and classify evidence while applying the appropriate legal rules. In conclusion, ChatGPT has the potential to assist interested parties in the analysis of evidence and serve as a valuable legal resource for individuals with less experience or knowledge in this area.

AI and the Law: Evaluating ChatGPT's Performance in Legal Classification

TL;DR

Abstract

AI and the Law: Evaluating ChatGPT's Performance in Legal Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)