AugAbEx : Way Forward for Extractive Case Summarization
Purnima Bindal, Vikas Kumar, Sagar Rathore, Vasudha Bhatnagar
TL;DR
AugAbEx tackles the challenge of extracting faithful extractive summaries for legal judgments by transforming existing abstractive gold standards into extractive equivalents. The authors propose a transparent pipeline that uses ROUGE-based candidate selection and Maximum Marginal Relevance (MMR) to form extractive summaries with $k=2$ and $\lambda=0.5$, and they evaluate across domain, semantic, lexical, and structural dimensions, including comparisons with LSA and human evaluation. The primary contributions are seven augmented datasets containing extractive gold standards and a comprehensive multi-dimensional evaluation framework that supports robust training and benchmarking of extractive legal summarizers. This resource enables more faithful, domain-aware extractive summarization in the legal NLP community and facilitates scalable dataset growth for future research.
Abstract
Summarization of legal judgments poses a heavy cognitive burden on law practitioners due to the complexity of the language, context-sensitive legal jargon, and the length of the document. Therefore, the automatic summarization of legal documents has attracted serious attention from natural language processing researchers. Since the abstractive summaries of legal documents generated by deep neural methods remain prone to the risk of misrepresenting nuanced legal jargon or overlooking key contextual details, we envisage a rising trend toward the use of extractive case summarizers. Given the high cost of human annotation for gold standard extractive summaries, we engineer a light and transparent pipeline that leverages existing abstractive gold standard summaries to create the corresponding extractive gold standard versions. The approach ensures that the experts` opinions ensconced in the original gold standard abstractive summaries are carried over to the transformed extractive summaries. We aim to augment seven existing case summarization datasets, which include abstractive summaries, by incorporating corresponding extractive summaries and create an enriched data resource for case summarization research community. To ensure the quality of the augmented extractive summaries, we perform an extensive comparative evaluation with the original abstractive gold standard summaries covering structural, lexical, and semantic dimensions. We also compare the domain-level information of the two summaries. We commit to release the augmented datasets in the public domain for use by the research community and believe that the resource will offer opportunities to advance the field of automatic summarization of legal documents.
