GLARE: Guided LexRank for Advanced Retrieval in Legal Analysis
Fabio Gregório, Rafaela Castro, Kele Belloze, Rui Pedro Lopes, Eduardo Bezerra
TL;DR
This work tackles the retrieval of relevant repetitive legal themes for Brazilian special appeals using an unsupervised framework. It introduces GLARE, which combines Guided LexRank-based extractive summarization with BM25-based similarity to a set of predefined themes, producing a ranked theme list without requiring labeled training data. Through two publicly released corpora (special appeals and repetitive themes) and extensive experiments, GLARE outperforms Elasticsearch baseline and shows robustness, especially in low-data and zero-shot scenarios, while outperforming several supervised baselines in minority-class settings. The approach is scalable, domain-agnostic to a degree, and offers practical impact for accelerating judicial classification while maintaining interpretability through human-in-the-loop theme selection.
Abstract
The Brazilian Constitution, known as the Citizen's Charter, provides mechanisms for citizens to petition the Judiciary, including the so-called special appeal. This specific type of appeal aims to standardize the legal interpretation of Brazilian legislation in cases where the decision contradicts federal laws. The handling of special appeals is a daily task in the Judiciary, regularly presenting significant demands in its courts. We propose a new method called GLARE, based on unsupervised machine learning, to help the legal analyst classify a special appeal on a topic from a list made available by the National Court of Brazil (STJ). As part of this method, we propose a modification of the graph-based LexRank algorithm, which we call Guided LexRank. This algorithm generates the summary of a special appeal. The degree of similarity between the generated summary and different topics is evaluated using the BM25 algorithm. As a result, the method presents a ranking of themes most appropriate to the analyzed special appeal. The proposed method does not require prior labeling of the text to be evaluated and eliminates the need for large volumes of data to train a model. We evaluate the effectiveness of the method by applying it to a special appeal corpus previously classified by human experts.
