Table of Contents
Fetching ...

AugAbEx : Way Forward for Extractive Case Summarization

Purnima Bindal, Vikas Kumar, Sagar Rathore, Vasudha Bhatnagar

TL;DR

AugAbEx tackles the challenge of extracting faithful extractive summaries for legal judgments by transforming existing abstractive gold standards into extractive equivalents. The authors propose a transparent pipeline that uses ROUGE-based candidate selection and Maximum Marginal Relevance (MMR) to form extractive summaries with $k=2$ and $\lambda=0.5$, and they evaluate across domain, semantic, lexical, and structural dimensions, including comparisons with LSA and human evaluation. The primary contributions are seven augmented datasets containing extractive gold standards and a comprehensive multi-dimensional evaluation framework that supports robust training and benchmarking of extractive legal summarizers. This resource enables more faithful, domain-aware extractive summarization in the legal NLP community and facilitates scalable dataset growth for future research.

Abstract

Summarization of legal judgments poses a heavy cognitive burden on law practitioners due to the complexity of the language, context-sensitive legal jargon, and the length of the document. Therefore, the automatic summarization of legal documents has attracted serious attention from natural language processing researchers. Since the abstractive summaries of legal documents generated by deep neural methods remain prone to the risk of misrepresenting nuanced legal jargon or overlooking key contextual details, we envisage a rising trend toward the use of extractive case summarizers. Given the high cost of human annotation for gold standard extractive summaries, we engineer a light and transparent pipeline that leverages existing abstractive gold standard summaries to create the corresponding extractive gold standard versions. The approach ensures that the experts` opinions ensconced in the original gold standard abstractive summaries are carried over to the transformed extractive summaries. We aim to augment seven existing case summarization datasets, which include abstractive summaries, by incorporating corresponding extractive summaries and create an enriched data resource for case summarization research community. To ensure the quality of the augmented extractive summaries, we perform an extensive comparative evaluation with the original abstractive gold standard summaries covering structural, lexical, and semantic dimensions. We also compare the domain-level information of the two summaries. We commit to release the augmented datasets in the public domain for use by the research community and believe that the resource will offer opportunities to advance the field of automatic summarization of legal documents.

AugAbEx : Way Forward for Extractive Case Summarization

TL;DR

AugAbEx tackles the challenge of extracting faithful extractive summaries for legal judgments by transforming existing abstractive gold standards into extractive equivalents. The authors propose a transparent pipeline that uses ROUGE-based candidate selection and Maximum Marginal Relevance (MMR) to form extractive summaries with and , and they evaluate across domain, semantic, lexical, and structural dimensions, including comparisons with LSA and human evaluation. The primary contributions are seven augmented datasets containing extractive gold standards and a comprehensive multi-dimensional evaluation framework that supports robust training and benchmarking of extractive legal summarizers. This resource enables more faithful, domain-aware extractive summarization in the legal NLP community and facilitates scalable dataset growth for future research.

Abstract

Summarization of legal judgments poses a heavy cognitive burden on law practitioners due to the complexity of the language, context-sensitive legal jargon, and the length of the document. Therefore, the automatic summarization of legal documents has attracted serious attention from natural language processing researchers. Since the abstractive summaries of legal documents generated by deep neural methods remain prone to the risk of misrepresenting nuanced legal jargon or overlooking key contextual details, we envisage a rising trend toward the use of extractive case summarizers. Given the high cost of human annotation for gold standard extractive summaries, we engineer a light and transparent pipeline that leverages existing abstractive gold standard summaries to create the corresponding extractive gold standard versions. The approach ensures that the experts` opinions ensconced in the original gold standard abstractive summaries are carried over to the transformed extractive summaries. We aim to augment seven existing case summarization datasets, which include abstractive summaries, by incorporating corresponding extractive summaries and create an enriched data resource for case summarization research community. To ensure the quality of the augmented extractive summaries, we perform an extensive comparative evaluation with the original abstractive gold standard summaries covering structural, lexical, and semantic dimensions. We also compare the domain-level information of the two summaries. We commit to release the augmented datasets in the public domain for use by the research community and believe that the resource will offer opportunities to advance the field of automatic summarization of legal documents.

Paper Structure

This paper contains 23 sections, 3 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Pipeline to transform original abstractive gold (OAG) summary to transformed extractive gold (TEG) summary.
  • Figure 2: Automatic Evaluation Framework for Transformed Extractive Gold Summary
  • Figure 3: Statistical properties of the datasets. WC: Average word count (summary length), SC: Average number of sentences in the document, CD: Case documents, S: Summary, CR: Average compression ratio
  • Figure 4: Comparison of macro-Averaged recall score of provisions in the transformed extractive gold summaries for varying number of candidate sentences.
  • Figure 8: Comparison of the distribution of similarity scores of OAG and TEG summaries in the latent and embedding space of LegalBert.
  • ...and 1 more figures