Table of Contents
Fetching ...

It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human

Jakub Harasta, Tereza Novotná, Jaromir Savelka

TL;DR

This study examined whether the perception of legal documents by lawyers and law students varies based on their assumed origin (human-crafted vs AI-generated), revealing a clear preference for documents perceived as crafted by a human over those believed to be generated by AI.

Abstract

Large Language Models (LLMs) enable a future in which certain types of legal documents may be generated automatically. This has a great potential to streamline legal processes, lower the cost of legal services, and dramatically increase access to justice. While many researchers focus on proposing and evaluating LLM-based applications supporting tasks in the legal domain, there is a notable lack of investigations into how legal professionals perceive content if they believe an LLM has generated it. Yet, this is a critical point as over-reliance or unfounded scepticism may influence whether such documents bring about appropriate legal consequences. This study is the necessary analysis of the ongoing transition towards mature generative AI systems. Specifically, we examined whether the perception of legal documents' by lawyers and law students (n=75) varies based on their assumed origin (human-crafted vs AI-generated). The participants evaluated the documents, focusing on their correctness and language quality. Our analysis revealed a clear preference for documents perceived as crafted by a human over those believed to be generated by AI. At the same time, most participants expect the future in which documents will be generated automatically. These findings could be leveraged by legal practitioners, policymakers, and legislators to implement and adopt legal document generation technology responsibly and to fuel the necessary discussions on how legal processes should be updated to reflect recent technological developments.

It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human

TL;DR

This study examined whether the perception of legal documents by lawyers and law students varies based on their assumed origin (human-crafted vs AI-generated), revealing a clear preference for documents perceived as crafted by a human over those believed to be generated by AI.

Abstract

Large Language Models (LLMs) enable a future in which certain types of legal documents may be generated automatically. This has a great potential to streamline legal processes, lower the cost of legal services, and dramatically increase access to justice. While many researchers focus on proposing and evaluating LLM-based applications supporting tasks in the legal domain, there is a notable lack of investigations into how legal professionals perceive content if they believe an LLM has generated it. Yet, this is a critical point as over-reliance or unfounded scepticism may influence whether such documents bring about appropriate legal consequences. This study is the necessary analysis of the ongoing transition towards mature generative AI systems. Specifically, we examined whether the perception of legal documents' by lawyers and law students (n=75) varies based on their assumed origin (human-crafted vs AI-generated). The participants evaluated the documents, focusing on their correctness and language quality. Our analysis revealed a clear preference for documents perceived as crafted by a human over those believed to be generated by AI. At the same time, most participants expect the future in which documents will be generated automatically. These findings could be leveraged by legal practitioners, policymakers, and legislators to implement and adopt legal document generation technology responsibly and to fuel the necessary discussions on how legal processes should be updated to reflect recent technological developments.
Paper Structure (18 sections, 12 figures)

This paper contains 18 sections, 12 figures.

Figures (12)

  • Figure 1: The figure outlines the structure of the Brief (left) and the Verbose (right) documents. Structure contains designation of parties, headline, acknowledgement of debt, origin of debt, due date and confirmation of absence of duress.
  • Figure 2: The figure contains snippets of two variants of the Verbose document. One is designated as 'AI-GENERATED DOCUMENT' (left) in its header, and the other as 'HUMAN-CRAFTED DOCUMENT' (right). The same designation appears also in the footer of every variant.
  • Figure 3: The figure summarizes the participants' preferences between the two documents (Brief and Verbose). The top two charts show the distribution of scores awarded to each document in terms of their Correctness and Language Quality (1--worst; 5--best). The bottom two charts present the results of the side-by-side comparisons, showing how many times each of the documents was preferred (if any). Overall, a clear preference for the Verbose document over the Brief one can be observed.
  • Figure 4: The figure summarizes the participants' preferences between the documents when labeled as AI-generated versus human-crafted in terms of their Correctness. The top two charts show the distribution of scores awarded to the documents carrying the "AI" (green) or "human" (blue) labels (1--worst; 5--best). The bottom chart presents the results of the side-by-side comparison, showing how many times each of the labels was preferred (if any). Overall, a clear preference for the documents labeled as human-crafted over those labeled as AI-generated can be observed.
  • Figure 5: The figure summarizes the participants' preferences between the documents when labeled as AI-generated versus human-crafted in terms of their Correctness when the document is taken into account (Brief and Verbose). The top two charts show the distribution of scores awarded to the documents carrying the "AI" (green) or "human" (blue) labels (1--worst; 5--best). The bottom charts present the results of the side-by-side comparisons, showing how many times each of the labels was preferred (if any) per document. While the Verbose document is clearly preferred overall, the AI-generated label appears to largely mitigate the effect in case of when attached to the Verbose document and largely amplify it when put on the Brief document.
  • ...and 7 more figures