Table of Contents
Fetching ...

Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents

Nishchal Prasad, Mohand Boughanem, Taoufiq Dkaki

TL;DR

The paper tackles judgment prediction for very long, unstructured legal documents lacking explicit structure. It introduces MESc, a hierarchical, multi-stage framework that divides documents into chunks, learns chunk-level representations from the last four layers of a fine-tuned encoder, and uses unsupervised clustering (HDBSCAN) with dimensionality reduction (pUMAP) to infer document structure, which is fused with a global representation for final classification. Across ILDC and LexGLUE subsets (ECtHR A/B and SCOTUS), MESc improves over state-of-the-art baselines, with gains amplified by concatenating multiple last-layer embeddings and incorporating the approximated structure, and by exploring intra-domain transfer using billion-parameter LLMs such as GPT-Neo and GPT-J, especially with longer input lengths. The work demonstrates scalable processing of long legal texts, reveals transfer potential of large LLMs across legal domains, and suggests future directions in explainability and broader jurisdictional applicability while maintaining ethical safeguards. The approach formalizes key components as $E_{i,D} ∈ ℝ^{l × d}$ chunk embeddings and a final $G(igl\u00a5E_{D}^{(p)}igr) ∈ ℝ^{128}$ representation, integrating structure labels $S_D$ to produce $O(D) ∈ ℝ^{u}$ outputs for judgments.

Abstract

Legal judgment prediction suffers from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents becomes a challenging task, more so on documents with no structural annotation. We explore the classification of these large legal documents and their lack of structural information with a deep-learning-based hierarchical framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. Specifically, we divide a document into parts to extract their embeddings from the last four layers of a custom fine-tuned Large Language Model, and try to approximate their structure through unsupervised clustering. Which we use in another set of transformer encoder layers to learn the inter-chunk representations. We analyze the adaptability of Large Language Models (LLMs) with multi-billion parameters (GPT-Neo, and GPT-J) with the hierarchical framework of MESc and compare them with their standalone performance on legal texts. We also study their intra-domain(legal) transfer learning capability and the impact of combining embeddings from their last layers in MESc. We test these methods and their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. Our approach achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art methods.

Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents

TL;DR

The paper tackles judgment prediction for very long, unstructured legal documents lacking explicit structure. It introduces MESc, a hierarchical, multi-stage framework that divides documents into chunks, learns chunk-level representations from the last four layers of a fine-tuned encoder, and uses unsupervised clustering (HDBSCAN) with dimensionality reduction (pUMAP) to infer document structure, which is fused with a global representation for final classification. Across ILDC and LexGLUE subsets (ECtHR A/B and SCOTUS), MESc improves over state-of-the-art baselines, with gains amplified by concatenating multiple last-layer embeddings and incorporating the approximated structure, and by exploring intra-domain transfer using billion-parameter LLMs such as GPT-Neo and GPT-J, especially with longer input lengths. The work demonstrates scalable processing of long legal texts, reveals transfer potential of large LLMs across legal domains, and suggests future directions in explainability and broader jurisdictional applicability while maintaining ethical safeguards. The approach formalizes key components as chunk embeddings and a final representation, integrating structure labels to produce outputs for judgments.

Abstract

Legal judgment prediction suffers from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents becomes a challenging task, more so on documents with no structural annotation. We explore the classification of these large legal documents and their lack of structural information with a deep-learning-based hierarchical framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. Specifically, we divide a document into parts to extract their embeddings from the last four layers of a custom fine-tuned Large Language Model, and try to approximate their structure through unsupervised clustering. Which we use in another set of transformer encoder layers to learn the inter-chunk representations. We analyze the adaptability of Large Language Models (LLMs) with multi-billion parameters (GPT-Neo, and GPT-J) with the hierarchical framework of MESc and compare them with their standalone performance on legal texts. We also study their intra-domain(legal) transfer learning capability and the impact of combining embeddings from their last layers in MESc. We test these methods and their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. Our approach achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art methods.
Paper Structure (15 sections, 5 equations, 3 figures, 2 tables)

This paper contains 15 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Multi-stage Encoder-based Supervi- sed with-clustering (MESc) framework.
  • Figure 2: An example of clustering of chunk representations of two documents to generate structure labels.
  • Figure 4: $\mu$-F1 for chunk-number for GPT-J ($\gamma$) vs MESc (GPT-J* ($\gamma$)) in SCOTUS on both Validation(Val.) and Test set.