Table of Contents
Fetching ...

Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios

Camilla Bignotti, Carolina Camassa

TL;DR

The paper investigates GPT-4's alignment with constitutional principles in a dataset of Italian Constitutional Court bioethics rulings. It employs a contrastive learning-based embedding framework to quantify alignment via cosine distance against the Applicant, Court, and State. Findings show a consistent tilt toward progressive interpretations, particularly aligning with the applicant's positions, with limited engagement of competing values. The work highlights the need for careful testing and human supervision when deploying LLMs in legal decision-making and outlines directions for broader cross-jurisdictional evaluations.

Abstract

In this paper, we conduct an empirical analysis of how large language models (LLMs), specifically GPT-4, interpret constitutional principles in complex decision-making scenarios. We examine rulings from the Italian Constitutional Court on bioethics issues that involve trade-offs between competing values and compare model-generated legal arguments on these issues to those presented by the State, the Court, and the applicants. Our results indicate that GPT-4 consistently aligns more closely with progressive interpretations of the Constitution, often overlooking competing values and mirroring the applicants' views rather than the more conservative perspectives of the State or the Court's moderate positions. Our experiments reveal a distinct tendency of GPT-4 to favor progressive legal interpretations, underscoring the influence of underlying data biases. We thus underscore the importance of testing alignment in real-world scenarios and considering the implications of deploying LLMs in decision-making processes.

Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios

TL;DR

The paper investigates GPT-4's alignment with constitutional principles in a dataset of Italian Constitutional Court bioethics rulings. It employs a contrastive learning-based embedding framework to quantify alignment via cosine distance against the Applicant, Court, and State. Findings show a consistent tilt toward progressive interpretations, particularly aligning with the applicant's positions, with limited engagement of competing values. The work highlights the need for careful testing and human supervision when deploying LLMs in legal decision-making and outlines directions for broader cross-jurisdictional evaluations.

Abstract

In this paper, we conduct an empirical analysis of how large language models (LLMs), specifically GPT-4, interpret constitutional principles in complex decision-making scenarios. We examine rulings from the Italian Constitutional Court on bioethics issues that involve trade-offs between competing values and compare model-generated legal arguments on these issues to those presented by the State, the Court, and the applicants. Our results indicate that GPT-4 consistently aligns more closely with progressive interpretations of the Constitution, often overlooking competing values and mirroring the applicants' views rather than the more conservative perspectives of the State or the Court's moderate positions. Our experiments reveal a distinct tendency of GPT-4 to favor progressive legal interpretations, underscoring the influence of underlying data biases. We thus underscore the importance of testing alignment in real-world scenarios and considering the implications of deploying LLMs in decision-making processes.
Paper Structure (32 sections, 1 equation, 4 figures, 3 tables)

This paper contains 32 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 2: Results of the evaluation of GPT-4's argument extraction task from Section \ref{['subsec:gpt-analyst']}. The scores, given on a scale from 1 to 5 according to the rubric in Appendix C, show a consistently good performance of the model on the task.
  • Figure 3: Effect of finetuning an embedding model with a contrastive learning loss. Starting from a set of legal arguments, we create pairs $(a_1,a_2)$ of arguments made by different legal parties on the same case. Through a manual classification, the pairs are labeled as concordant (1) or opposing (0). The model is trained to optimize its embeddings by pushing further in vector space the pairs of arguments that are dissimilar, while moving closer the pairs that express similar interpretations of the law.
  • Figure 4: Panel A shows the distance between GPT-4's and the three legal parties' arguments on the set of constitutional principles cited in our case dataset. We see a consistent trend in which GPT-4 is closer to the Applicant's interpretation of the articles, which are usually more progressive. Panel B shows an example of how the distance between arguments is reflected in the different interpretations of the same article---Art. 2 on human rights---in a legal case on PMA.
  • Figure 5: Each point in the plot shows the mean and deviation of the distance between GPT-4’s legal stance and the arguments of the Applicant, Court, and State over five iterations of the same prompt. For brevity, only Article 3 of the Constitution is shown. We observe that GPT-4's alignment remains mostly consistent across iterations, especially for the Court and State, while showing more variance in the distance from the Applicant’s position.