Table of Contents
Fetching ...

Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation

Yannis Spyridis, Jean-Paul, Haneen Deeb, Vasileios Argyriou

TL;DR

This paper addresses the lack of domain-specific data for defensive statement analysis in the legal NLP space by introducing a police-interview statement dataset and a DistilBERT-based classifier. A fine-tuned DistilBERT model distinguishes truthful from deceptive statements and is augmented with gradient-based saliency maps for interpretability. An interactive XAI interface enables legal professionals and researchers to explore predictions and explanations. The results show 86% accuracy and favorable ROC-AUC, demonstrating the feasibility of end-to-end, transparent statement analysis in pre-court contexts.

Abstract

The classification of statements provided by individuals during police interviews is a complex and significant task within the domain of natural language processing (NLP) and legal informatics. The lack of extensive domain-specific datasets raises challenges to the advancement of NLP methods in the field. This paper aims to address some of the present challenges by introducing a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. Utilising the curated dataset for training and evaluation, we introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. To enhance interpretability, we employ explainable artificial intelligence (XAI) methods to offer explainability through saliency maps, that interpret the model's decision-making process. Lastly, we present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system. Our model achieves an accuracy of 86%, and is shown to outperform a custom transformer architecture in a comparative study. This holistic approach advances the accessibility, transparency, and effectiveness of statement analysis, with promising implications for both legal practice and research.

Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation

TL;DR

This paper addresses the lack of domain-specific data for defensive statement analysis in the legal NLP space by introducing a police-interview statement dataset and a DistilBERT-based classifier. A fine-tuned DistilBERT model distinguishes truthful from deceptive statements and is augmented with gradient-based saliency maps for interpretability. An interactive XAI interface enables legal professionals and researchers to explore predictions and explanations. The results show 86% accuracy and favorable ROC-AUC, demonstrating the feasibility of end-to-end, transparent statement analysis in pre-court contexts.

Abstract

The classification of statements provided by individuals during police interviews is a complex and significant task within the domain of natural language processing (NLP) and legal informatics. The lack of extensive domain-specific datasets raises challenges to the advancement of NLP methods in the field. This paper aims to address some of the present challenges by introducing a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. Utilising the curated dataset for training and evaluation, we introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. To enhance interpretability, we employ explainable artificial intelligence (XAI) methods to offer explainability through saliency maps, that interpret the model's decision-making process. Lastly, we present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system. Our model achieves an accuracy of 86%, and is shown to outperform a custom transformer architecture in a comparative study. This holistic approach advances the accessibility, transparency, and effectiveness of statement analysis, with promising implications for both legal practice and research.
Paper Structure (17 sections, 4 equations, 4 figures, 5 tables)

This paper contains 17 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The end-to-end system pipeline for defensive statement classification and explainability.
  • Figure 2: Confusion matrix of the fine-tuned model.
  • Figure 3: The XAI interface for defensive statement classification and interpretability.
  • Figure 4: Visualising the attention in each layer.