Large Language Models in Cryptocurrency Securities Cases: Can a GPT Model Meaningfully Assist Lawyers?
Arianna Trozze, Toby Davies, Bennett Kleinberg
TL;DR
This study interrogates the practical utility of large language models for legal tasks in cryptocurrency securities cases. It combines two experiments: GPT-3.5 for legal reasoning and ChatGPT for drafting pleadings, using real-world case material to gauge performance. GPT-3.5 demonstrates weak law-spotting ability with many false negatives, while ChatGPT performs substantially better at drafting, yielding AI-generated complaints that jurors find convincing and that are not significantly distinguishable from lawyer-authored ones. The findings suggest limited immediate value for LLMs in complex legal reasoning, but meaningful potential for drafting assistance that could reduce attorneys' time and increase document clarity, with important caveats and directions for future research. Overall, the work provides a first systematic look at LLMs in litigation contexts and crypto-securities law, informing both practitioners and researchers about capabilities, limits, and avenues for improvement.
Abstract
Large Language Models (LLMs) could be a useful tool for lawyers. However, empirical research on their effectiveness in conducting legal tasks is scant. We study securities cases involving cryptocurrencies as one of numerous contexts where AI could support the legal process, studying GPT-3.5's legal reasoning and ChatGPT's legal drafting capabilities. We examine whether a) GPT-3.5 can accurately determine which laws are potentially being violated from a fact pattern, and b) whether there is a difference in juror decision-making based on complaints written by a lawyer compared to ChatGPT. We feed fact patterns from real-life cases to GPT-3.5 and evaluate its ability to determine correct potential violations from the scenario and exclude spurious violations. Second, we had mock jurors assess complaints written by ChatGPT and lawyers. GPT-3.5's legal reasoning skills proved weak, though we expect improvement in future models, particularly given the violations it suggested tended to be correct (it merely missed additional, correct violations). ChatGPT performed better at legal drafting, and jurors' decisions were not statistically significantly associated with the author of the document upon which they based their decisions. Because GPT-3.5 cannot satisfactorily conduct legal reasoning tasks, it would be unlikely to be able to help lawyers in a meaningful way at this stage. However, ChatGPT's drafting skills (though, perhaps, still inferior to lawyers) could assist lawyers in providing legal services. Our research is the first to systematically study an LLM's legal drafting and reasoning capabilities in litigation, as well as in securities law and cryptocurrency-related misconduct.
