Objection Overruled! Lay People can Distinguish Large Language Models from Lawyers, but still Favour Advice from an LLM
Eike Schneiders, Tina Seabrooke, Joshua Krook, Richard Hyde, Natalie Leesakul, Jeremie Clos, Joel Fischer
TL;DR
This study probes how laypeople respond to AI-generated versus lawyer-generated legal advice, focusing on two aspects: willingness to act on such advice when the source is unknown and the ability to distinguish the source afterward. Across three experiments (N total = 288), the authors find that informants are more willing to act on LLM-generated advice when the advice source is not disclosed, a result robust across experiments 1 and 2, and they demonstrate above-chance discrimination of sources in Experiment 3 (AUC = $0.59$). The discussion links these effects to language complexity, potential overtrust in AI, and policy implications, including transparency and AI literacy strategies. The work underscores the practical risk that non-experts may overvalue AI-generated legal guidance when the source is opaque, even as they retain partial ability to identify AI authorship. Overall, the findings motivate caution in deploying LLM-based legal assistance and point to avenues for improving transparency and user education.
Abstract
Large Language Models (LLMs) are seemingly infiltrating every domain, and the legal context is no exception. In this paper, we present the results of three experiments (total N = 288) that investigated lay people's willingness to act upon, and their ability to discriminate between, LLM- and lawyer-generated legal advice. In Experiment 1, participants judged their willingness to act on legal advice when the source of the advice was either known or unknown. When the advice source was unknown, participants indicated that they were significantly more willing to act on the LLM-generated advice. The result of the source unknown condition was replicated in Experiment 2. Intriguingly, despite participants indicating higher willingness to act on LLM-generated advice in Experiments 1 and 2, participants discriminated between the LLM- and lawyer-generated texts significantly above chance-level in Experiment 3. Lastly, we discuss potential explanations and risks of our findings, limitations and future work.
