Table of Contents
Fetching ...

A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Nishchal Prasad, Mohand Boughanem, Taoufik Dkaki

TL;DR

This work tackles judgment prediction from long, scarce-annotated legal documents with non-uniform structure. It introduces MESc, a four-stage hierarchical classifier that divides documents into chunks, learns chunk-level representations, and uses unsupervised clustering to approximate document structure before final classification, and ORSE, an occlusion-based method for extractive explanations. Empirically, MESc achieves about a 2-point improvement over state-of-the-art on ILDC and LexGLUE, with larger gains when incorporating the inferred structure labels and when using larger language models; ORSE provides up to a 50% improvement in explanation quality on ILDC_Expert. The approach demonstrates that combining hierarchical chunk representations with structure-aware signals and post-hoc extractive explanations can effectively address the challenges of long, unstructured legal texts and offers a practical path toward interpretable legal AI systems.

Abstract

Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores.

A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

TL;DR

This work tackles judgment prediction from long, scarce-annotated legal documents with non-uniform structure. It introduces MESc, a four-stage hierarchical classifier that divides documents into chunks, learns chunk-level representations, and uses unsupervised clustering to approximate document structure before final classification, and ORSE, an occlusion-based method for extractive explanations. Empirically, MESc achieves about a 2-point improvement over state-of-the-art on ILDC and LexGLUE, with larger gains when incorporating the inferred structure labels and when using larger language models; ORSE provides up to a 50% improvement in explanation quality on ILDC_Expert. The approach demonstrates that combining hierarchical chunk representations with structure-aware signals and post-hoc extractive explanations can effectively address the challenges of long, unstructured legal texts and offers a practical path toward interpretable legal AI systems.

Abstract

Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores.
Paper Structure (20 sections, 7 equations, 2 figures, 5 tables, 1 algorithm)