Table of Contents
Fetching ...

GLARE: Guided LexRank for Advanced Retrieval in Legal Analysis

Fabio Gregório, Rafaela Castro, Kele Belloze, Rui Pedro Lopes, Eduardo Bezerra

TL;DR

This work tackles the retrieval of relevant repetitive legal themes for Brazilian special appeals using an unsupervised framework. It introduces GLARE, which combines Guided LexRank-based extractive summarization with BM25-based similarity to a set of predefined themes, producing a ranked theme list without requiring labeled training data. Through two publicly released corpora (special appeals and repetitive themes) and extensive experiments, GLARE outperforms Elasticsearch baseline and shows robustness, especially in low-data and zero-shot scenarios, while outperforming several supervised baselines in minority-class settings. The approach is scalable, domain-agnostic to a degree, and offers practical impact for accelerating judicial classification while maintaining interpretability through human-in-the-loop theme selection.

Abstract

The Brazilian Constitution, known as the Citizen's Charter, provides mechanisms for citizens to petition the Judiciary, including the so-called special appeal. This specific type of appeal aims to standardize the legal interpretation of Brazilian legislation in cases where the decision contradicts federal laws. The handling of special appeals is a daily task in the Judiciary, regularly presenting significant demands in its courts. We propose a new method called GLARE, based on unsupervised machine learning, to help the legal analyst classify a special appeal on a topic from a list made available by the National Court of Brazil (STJ). As part of this method, we propose a modification of the graph-based LexRank algorithm, which we call Guided LexRank. This algorithm generates the summary of a special appeal. The degree of similarity between the generated summary and different topics is evaluated using the BM25 algorithm. As a result, the method presents a ranking of themes most appropriate to the analyzed special appeal. The proposed method does not require prior labeling of the text to be evaluated and eliminates the need for large volumes of data to train a model. We evaluate the effectiveness of the method by applying it to a special appeal corpus previously classified by human experts.

GLARE: Guided LexRank for Advanced Retrieval in Legal Analysis

TL;DR

This work tackles the retrieval of relevant repetitive legal themes for Brazilian special appeals using an unsupervised framework. It introduces GLARE, which combines Guided LexRank-based extractive summarization with BM25-based similarity to a set of predefined themes, producing a ranked theme list without requiring labeled training data. Through two publicly released corpora (special appeals and repetitive themes) and extensive experiments, GLARE outperforms Elasticsearch baseline and shows robustness, especially in low-data and zero-shot scenarios, while outperforming several supervised baselines in minority-class settings. The approach is scalable, domain-agnostic to a degree, and offers practical impact for accelerating judicial classification while maintaining interpretability through human-in-the-loop theme selection.

Abstract

The Brazilian Constitution, known as the Citizen's Charter, provides mechanisms for citizens to petition the Judiciary, including the so-called special appeal. This specific type of appeal aims to standardize the legal interpretation of Brazilian legislation in cases where the decision contradicts federal laws. The handling of special appeals is a daily task in the Judiciary, regularly presenting significant demands in its courts. We propose a new method called GLARE, based on unsupervised machine learning, to help the legal analyst classify a special appeal on a topic from a list made available by the National Court of Brazil (STJ). As part of this method, we propose a modification of the graph-based LexRank algorithm, which we call Guided LexRank. This algorithm generates the summary of a special appeal. The degree of similarity between the generated summary and different topics is evaluated using the BM25 algorithm. As a result, the method presents a ranking of themes most appropriate to the analyzed special appeal. The proposed method does not require prior labeling of the text to be evaluated and eliminates the need for large volumes of data to train a model. We evaluate the effectiveness of the method by applying it to a special appeal corpus previously classified by human experts.
Paper Structure (19 sections, 10 equations, 8 figures, 11 tables)

This paper contains 19 sections, 10 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: PageRank is the basis for other graph-based algorithms such as LexRank. Source: Mehta, Parth and Prasenjit Majumder. From Extractive to Abstractive Summarization: A Journey (2019, p.12) mehta2019extractive
  • Figure 2: Distribution of themes in the dataset
  • Figure 3: Steps of the proposed method.
  • Figure 4: The quality of the ranking of suggestions, given by each summarization technique, can be measured by the MAP and NDCG metrics.
  • Figure 5: The model's performance is evaluated according to the number of sentences contained in the summary. Another aspect considered was the metric used to evaluate the similarity between the special appeal summary and the theme, whether by BM25 or cosine.
  • ...and 3 more figures