Table of Contents
Fetching ...

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

Nirmalie Wiratunga, Ramitha Abeyratne, Lasal Jayawardena, Kyle Martin, Stewart Massie, Ikechukwu Nkisi-Orji, Ruvan Weerasinghe, Anne Liret, Bruno Fleisch

TL;DR

This work addresses the challenge of producing verifiable, evidence-grounded responses in legal QA by embedding a Case-Based Reasoning–driven retrieval module within Retrieval-Augmented Generation. It formalizes CBR-RAG to augment LLM prompts with context from a casebase built on ALQA and systematically compares representation and similarity strategies, including intra, inter, and hybrid retrieval with BERT, LegalBERT, and AnglEBERT embeddings. Empirical results show that hybrid AnglEBERT retrieval with full-case context yields the best semantic alignment and outperforms No-RAG, supporting the value of structured case-based retrieval for knowledge-intensive generation tasks. The findings suggest that carefully chosen domain-adapted and contrastive embeddings, together with hybrid retrieval, can significantly improve the quality and faithfulness of legal QA outputs, with practical implications for deploying evidence-grounded LLMs in legal contexts.

Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) output by providing prior knowledge as context to input. This is beneficial for knowledge-intensive and expert reliant tasks, including legal question-answering, which require evidence to validate generated text outputs. We highlight that Case-Based Reasoning (CBR) presents key opportunities to structure retrieval as part of the RAG process in an LLM. We introduce CBR-RAG, where CBR cycle's initial retrieval stage, its indexing vocabulary, and similarity knowledge containers are used to enhance LLM queries with contextually relevant cases. This integration augments the original LLM query, providing a richer prompt. We present an evaluation of CBR-RAG, and examine different representations (i.e. general and domain-specific embeddings) and methods of comparison (i.e. inter, intra and hybrid similarity) on the task of legal question-answering. Our results indicate that the context provided by CBR's case reuse enforces similarity between relevant components of the questions and the evidence base leading to significant improvements in the quality of generated answers.

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

TL;DR

This work addresses the challenge of producing verifiable, evidence-grounded responses in legal QA by embedding a Case-Based Reasoning–driven retrieval module within Retrieval-Augmented Generation. It formalizes CBR-RAG to augment LLM prompts with context from a casebase built on ALQA and systematically compares representation and similarity strategies, including intra, inter, and hybrid retrieval with BERT, LegalBERT, and AnglEBERT embeddings. Empirical results show that hybrid AnglEBERT retrieval with full-case context yields the best semantic alignment and outperforms No-RAG, supporting the value of structured case-based retrieval for knowledge-intensive generation tasks. The findings suggest that carefully chosen domain-adapted and contrastive embeddings, together with hybrid retrieval, can significantly improve the quality and faithfulness of legal QA outputs, with practical implications for deploying evidence-grounded LLMs in legal contexts.

Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) output by providing prior knowledge as context to input. This is beneficial for knowledge-intensive and expert reliant tasks, including legal question-answering, which require evidence to validate generated text outputs. We highlight that Case-Based Reasoning (CBR) presents key opportunities to structure retrieval as part of the RAG process in an LLM. We introduce CBR-RAG, where CBR cycle's initial retrieval stage, its indexing vocabulary, and similarity knowledge containers are used to enhance LLM queries with contextually relevant cases. This integration augments the original LLM query, providing a richer prompt. We present an evaluation of CBR-RAG, and examine different representations (i.e. general and domain-specific embeddings) and methods of comparison (i.e. inter, intra and hybrid similarity) on the task of legal question-answering. Our results indicate that the context provided by CBR's case reuse enforces similarity between relevant components of the questions and the evidence base leading to significant improvements in the quality of generated answers.
Paper Structure (17 sections, 8 equations, 5 figures, 4 tables)

This paper contains 17 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: CBR-RAG
  • Figure 2: Ten most frequent legal acts in the casebase are listed on the left, and the legal act frequency distribution appears on the right.
  • Figure 3: Architecture and training process for BERT and AnglEBERT. Note that LegalBERT has the same architecture as BERT, but is pre-trained on legal text.
  • Figure 4: Cosine similarity distribution for intra- and inter-embeddings.
  • Figure 5: F1 score for Retrieval@k