Table of Contents
Fetching ...

Athena: Retrieval-augmented Legal Judgment Prediction with Large Language Models

Xiao Peng, Liang Chen

TL;DR

This work proposes "Athena", a novel framework cultivating RAG as a core preprocess component to enhance LLMs' performance on specialized tasks, and shows that Athena's overall performance has improved significantly, achieving state-of-the-art results on the CAIL2018 dataset.

Abstract

Recently, large language models (LLMs) like ChatGPT, LLaMA, and Claude have prevailed in countless domains, including legal scenarios. With LLMs' rapid technological progress, the development of prompt engineering (PE) as an interface between the LLMs and real-world applications has drawn the attention of all developers. Various PE methods have been proposed to overcome real-world challenges, such as few-shot prompting, chain-of-thought, and retrieval-augmented generation (RAG). However, RAG for legal judgment prediction (LJP) is still underexplored. To address this, we propose "Athena", a novel framework cultivating RAG as a core preprocess component to enhance LLMs' performance on specialized tasks. Athena constructs a knowledge base for accusations, attached with a semantic retrieval mechanism through vectorization. Our experiments show that Athena's overall performance has improved significantly, achieving state-of-the-art results on the CAIL2018 dataset. Our ablation study on the in-context window size parameter further reproduces LLMs' "lost-in-the-middle" phenomenon with a relative positional variation. And with moderate hyper-parameter-tuning, we can achieve at most 95% of accuracy accordingly. We also study the impact of query rewriting and data distribution, providing possible directions for future research based on former analyses.

Athena: Retrieval-augmented Legal Judgment Prediction with Large Language Models

TL;DR

This work proposes "Athena", a novel framework cultivating RAG as a core preprocess component to enhance LLMs' performance on specialized tasks, and shows that Athena's overall performance has improved significantly, achieving state-of-the-art results on the CAIL2018 dataset.

Abstract

Recently, large language models (LLMs) like ChatGPT, LLaMA, and Claude have prevailed in countless domains, including legal scenarios. With LLMs' rapid technological progress, the development of prompt engineering (PE) as an interface between the LLMs and real-world applications has drawn the attention of all developers. Various PE methods have been proposed to overcome real-world challenges, such as few-shot prompting, chain-of-thought, and retrieval-augmented generation (RAG). However, RAG for legal judgment prediction (LJP) is still underexplored. To address this, we propose "Athena", a novel framework cultivating RAG as a core preprocess component to enhance LLMs' performance on specialized tasks. Athena constructs a knowledge base for accusations, attached with a semantic retrieval mechanism through vectorization. Our experiments show that Athena's overall performance has improved significantly, achieving state-of-the-art results on the CAIL2018 dataset. Our ablation study on the in-context window size parameter further reproduces LLMs' "lost-in-the-middle" phenomenon with a relative positional variation. And with moderate hyper-parameter-tuning, we can achieve at most 95% of accuracy accordingly. We also study the impact of query rewriting and data distribution, providing possible directions for future research based on former analyses.

Paper Structure

This paper contains 18 sections, 6 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Our framework "Athena" in prompting. Athena's prompt takes three inputs: a legal case, retrieved candidates, and an output format. The candidates are retrieved from the accusation knowledge base according to their similarity with the given legal case. The output format is an instruction for the LLMs to infer accordingly, like legal norms and case facts listed before the final judgment
  • Figure 2: The overall framework of Athena
  • Figure 3: The construction process of Athena's knowledge base
  • Figure 4: Demostration of 4 different methods with LLMs for legal judgment prediction
  • Figure 5: Hit Rate curve for the original description and rewritten description. The x-axis represents different in-context window sizes $k$, and the y-axis represents the corresponding Hit Rate. For example, at Top5 the rewritten description is improved by approximately 10% of Hit Rate compared with the original description. To reach a similar Hit Rate, the original description requires nearly twice as much in-context window size than the rewritten description
  • ...and 1 more figures