Table of Contents
Fetching ...

MARRO: Multi-headed Attention for Rhetorical Role Labeling in Legal Documents

Purbid Bambroo, Subinay Adhikary, Paheli Bhattacharya, Abhijnan Chakraborty, Saptarshi Ghosh, Kripabandhu Ghosh

TL;DR

This work tackles rhetorical role labeling in lengthy legal documents by introducing MARRO, a family of models that fuse transformer-inspired multi-headed attention with BiLSTM-CRF and multitask learning. By using label-shift as an auxiliary task and exploring both sent2vec and LEGAL-BERT-based embeddings, MARRO achieves state-of-the-art results on two datasets from the Indian and UK Supreme Courts, with significant improvement on the Indian data ($F1=0.724$, $p=0.0013$) and competitive results on the UK data ($F1=0.617$, $p=0.0891$). A new Indian dataset is introduced, expanding the labeled resource to 150 documents and $30{,}729$ sentences, and the approach is shown to benefit from domain-tailored embeddings. The work demonstrates the practical potential of context-enriched, multitask attention models for legal document understanding and downstream tasks like summarization and judgment prediction, while also discussing bottlenecks and future directions for cross-country applicability and tooling.

Abstract

Identification of rhetorical roles like facts, arguments, and final judgments is central to understanding a legal case document and can lend power to other downstream tasks like legal case summarization and judgment prediction. However, there are several challenges to this task. Legal documents are often unstructured and contain a specialized vocabulary, making it hard for conventional transformer models to understand them. Additionally, these documents run into several pages, which makes it difficult for neural models to capture the entire context at once. Lastly, there is a dearth of annotated legal documents to train deep learning models. Previous state-of-the-art approaches for this task have focused on using neural models like BiLSTM-CRF or have explored different embedding techniques to achieve decent results. While such techniques have shown that better embedding can result in improved model performance, not many models have focused on utilizing attention for learning better embeddings in sentences of a document. Additionally, it has been recently shown that advanced techniques like multi-task learning can help the models learn better representations, thereby improving performance. In this paper, we combine these two aspects by proposing a novel family of multi-task learning-based models for rhetorical role labeling, named MARRO, that uses transformer-inspired multi-headed attention. Using label shift as an auxiliary task, we show that models from the MARRO family achieve state-of-the-art results on two labeled datasets for rhetorical role labeling, from the Indian and UK Supreme Courts.

MARRO: Multi-headed Attention for Rhetorical Role Labeling in Legal Documents

TL;DR

This work tackles rhetorical role labeling in lengthy legal documents by introducing MARRO, a family of models that fuse transformer-inspired multi-headed attention with BiLSTM-CRF and multitask learning. By using label-shift as an auxiliary task and exploring both sent2vec and LEGAL-BERT-based embeddings, MARRO achieves state-of-the-art results on two datasets from the Indian and UK Supreme Courts, with significant improvement on the Indian data (, ) and competitive results on the UK data (, ). A new Indian dataset is introduced, expanding the labeled resource to 150 documents and sentences, and the approach is shown to benefit from domain-tailored embeddings. The work demonstrates the practical potential of context-enriched, multitask attention models for legal document understanding and downstream tasks like summarization and judgment prediction, while also discussing bottlenecks and future directions for cross-country applicability and tooling.

Abstract

Identification of rhetorical roles like facts, arguments, and final judgments is central to understanding a legal case document and can lend power to other downstream tasks like legal case summarization and judgment prediction. However, there are several challenges to this task. Legal documents are often unstructured and contain a specialized vocabulary, making it hard for conventional transformer models to understand them. Additionally, these documents run into several pages, which makes it difficult for neural models to capture the entire context at once. Lastly, there is a dearth of annotated legal documents to train deep learning models. Previous state-of-the-art approaches for this task have focused on using neural models like BiLSTM-CRF or have explored different embedding techniques to achieve decent results. While such techniques have shown that better embedding can result in improved model performance, not many models have focused on utilizing attention for learning better embeddings in sentences of a document. Additionally, it has been recently shown that advanced techniques like multi-task learning can help the models learn better representations, thereby improving performance. In this paper, we combine these two aspects by proposing a novel family of multi-task learning-based models for rhetorical role labeling, named MARRO, that uses transformer-inspired multi-headed attention. Using label shift as an auxiliary task, we show that models from the MARRO family achieve state-of-the-art results on two labeled datasets for rhetorical role labeling, from the Indian and UK Supreme Courts.

Paper Structure

This paper contains 11 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Descriptions of labels are substituted in the corresponding slots of the template shown in Figure \ref{['tab:prompt_template']}.
  • Figure 2: The structure of the prompt used in our ICL experiments. [X] represents a variable that is to be substituted for its value. Figure \ref{['fig:prompt_values']} shows a concrete instance of this template with substituted values for the context variables.
  • Figure 3: Values of tags, descriptions and annotated spans substituted in the corresponding slots of the template shown in Figure \ref{['tab:prompt_template']}.
  • Figure 4: The Model Architecture for MARRObase and TF-MARRO
  • Figure 5: The Model Architecture for MTL-based MARRO models (MTL-MARRO and MTL-TF-MARRO). The embeddings are shared between the label shift prediction module and the rhetorical role prediction module during the training. While inferencing or prediction, the rhetorical role prediction module works alone.