Table of Contents
Fetching ...

Empirical analysis of binding precedent efficiency in Brazilian Supreme Court via case classification

Raphaël Tinarrage, Henrique Ennes, Lucas Resck, Lucas T. Gomes, Jean R. Ponciano, Jorge Poco

TL;DR

The paper empirically investigates binding precedents in the Brazilian Supreme Court by framing the problem as case classification to enable Similar Case Retrieval. It compares TF-IDF, LSTM, Longformer, and regex for identifying BP-applicable documents, using Dataset #1 for labeled evaluation and Dataset #2 for generalization. Despite overall TF-IDF superiority on validation metrics, deep learning models uncover important legal events missed by TF-IDF, revealing heterogeneous and case-dependent drivers of BP inefficacy. The legal analysis identifies five hypotheses explaining why BPs may fail to reduce repetitive demands, emphasizing the dynamic, context-dependent nature of jurisprudence and the need for broader data and methodological refinements to assess impact reliably.

Abstract

Binding precedents (súmulas vinculantes) constitute a juridical instrument unique to the Brazilian legal system and whose objectives include the protection of the Federal Supreme Court against repetitive demands. Studies of the effectiveness of these instruments in decreasing the Court's exposure to similar cases, however, indicate that they tend to fail in such a direction, with some of the binding precedents seemingly creating new demands. We empirically assess the legal impact of five binding precedents, 11, 14, 17, 26, and 37, at the highest Court level through their effects on the legal subjects they address. This analysis is only possible through the comparison of the Court's ruling about the precedents' themes before they are created, which means that these decisions should be detected through techniques of Similar Case Retrieval, which we tackle from the angle of Case Classification. The contributions of this article are therefore twofold: on the mathematical side, we compare the use of different methods of Natural Language Processing -- TF-IDF, LSTM, Longformer, and regex -- for Case Classification, whereas on the legal side, we contrast the inefficiency of these binding precedents with a set of hypotheses that may justify their repeated usage. We observe that the TF-IDF models performed slightly better than LSTM and Longformer when compared through common metrics; however, the deep learning models were able to detect certain important legal events that TF-IDF missed. On the legal side, we argue that the reasons for binding precedents to fail in responding to repetitive demand are heterogeneous and case-dependent, making it impossible to single out a specific cause. We identify five main hypotheses, which are found in different combinations in each of the precedents studied.

Empirical analysis of binding precedent efficiency in Brazilian Supreme Court via case classification

TL;DR

The paper empirically investigates binding precedents in the Brazilian Supreme Court by framing the problem as case classification to enable Similar Case Retrieval. It compares TF-IDF, LSTM, Longformer, and regex for identifying BP-applicable documents, using Dataset #1 for labeled evaluation and Dataset #2 for generalization. Despite overall TF-IDF superiority on validation metrics, deep learning models uncover important legal events missed by TF-IDF, revealing heterogeneous and case-dependent drivers of BP inefficacy. The legal analysis identifies five hypotheses explaining why BPs may fail to reduce repetitive demands, emphasizing the dynamic, context-dependent nature of jurisprudence and the need for broader data and methodological refinements to assess impact reliably.

Abstract

Binding precedents (súmulas vinculantes) constitute a juridical instrument unique to the Brazilian legal system and whose objectives include the protection of the Federal Supreme Court against repetitive demands. Studies of the effectiveness of these instruments in decreasing the Court's exposure to similar cases, however, indicate that they tend to fail in such a direction, with some of the binding precedents seemingly creating new demands. We empirically assess the legal impact of five binding precedents, 11, 14, 17, 26, and 37, at the highest Court level through their effects on the legal subjects they address. This analysis is only possible through the comparison of the Court's ruling about the precedents' themes before they are created, which means that these decisions should be detected through techniques of Similar Case Retrieval, which we tackle from the angle of Case Classification. The contributions of this article are therefore twofold: on the mathematical side, we compare the use of different methods of Natural Language Processing -- TF-IDF, LSTM, Longformer, and regex -- for Case Classification, whereas on the legal side, we contrast the inefficiency of these binding precedents with a set of hypotheses that may justify their repeated usage. We observe that the TF-IDF models performed slightly better than LSTM and Longformer when compared through common metrics; however, the deep learning models were able to detect certain important legal events that TF-IDF missed. On the legal side, we argue that the reasons for binding precedents to fail in responding to repetitive demand are heterogeneous and case-dependent, making it impossible to single out a specific cause. We identify five main hypotheses, which are found in different combinations in each of the precedents studied.
Paper Structure (71 sections, 3 equations, 23 figures, 5 tables)

This paper contains 71 sections, 3 equations, 23 figures, 5 tables.

Figures (23)

  • Figure 1: Histograms of the number of cases judged by the Federal Supreme Court citing Binding Precedents 11, 14, 17, 26, or 37, in our collection (Dataset #1). The bins have a length of one year, and the curves are obtained via quadratic spline interpolation. The dashed vertical lines represent the date of publication of each BP. We draw the reader's attention to the fact that they all exhibit an increasing trend.
  • Figure 2: Schematic overview of the article. To understand the dynamics behind the use of a precedent, we train the models on an initial set of labeled documents, then apply these models to a larger set of data, and represent the results as a time series.
  • Figure 3: Histograms of the number of cases in Dataset #1 (top) and Dataset #2 (bottom). The bins have a length of one year.
  • Figure 4: Distribution of length of documents in Dataset #1, for different preprocessings.
  • Figure 5: Number of documents predicted by each model for BP 11 in Dataset #2, represented as a histogram (window length of one year) and interpolated via quadratic spline. We give the predictions for the thresholds adapted to Dataset #1 (dashed) and Dataset #2 (solid), as given in \ref{['tab:adjusted_probabilities']}. Two views are given, the second one zooming in on the ordinate axis.
  • ...and 18 more figures