CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

Nirmalie Wiratunga; Ramitha Abeyratne; Lasal Jayawardena; Kyle Martin; Stewart Massie; Ikechukwu Nkisi-Orji; Ruvan Weerasinghe; Anne Liret; Bruno Fleisch

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

Nirmalie Wiratunga, Ramitha Abeyratne, Lasal Jayawardena, Kyle Martin, Stewart Massie, Ikechukwu Nkisi-Orji, Ruvan Weerasinghe, Anne Liret, Bruno Fleisch

TL;DR

This work addresses the challenge of producing verifiable, evidence-grounded responses in legal QA by embedding a Case-Based Reasoning–driven retrieval module within Retrieval-Augmented Generation. It formalizes CBR-RAG to augment LLM prompts with context from a casebase built on ALQA and systematically compares representation and similarity strategies, including intra, inter, and hybrid retrieval with BERT, LegalBERT, and AnglEBERT embeddings. Empirical results show that hybrid AnglEBERT retrieval with full-case context yields the best semantic alignment and outperforms No-RAG, supporting the value of structured case-based retrieval for knowledge-intensive generation tasks. The findings suggest that carefully chosen domain-adapted and contrastive embeddings, together with hybrid retrieval, can significantly improve the quality and faithfulness of legal QA outputs, with practical implications for deploying evidence-grounded LLMs in legal contexts.

Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) output by providing prior knowledge as context to input. This is beneficial for knowledge-intensive and expert reliant tasks, including legal question-answering, which require evidence to validate generated text outputs. We highlight that Case-Based Reasoning (CBR) presents key opportunities to structure retrieval as part of the RAG process in an LLM. We introduce CBR-RAG, where CBR cycle's initial retrieval stage, its indexing vocabulary, and similarity knowledge containers are used to enhance LLM queries with contextually relevant cases. This integration augments the original LLM query, providing a richer prompt. We present an evaluation of CBR-RAG, and examine different representations (i.e. general and domain-specific embeddings) and methods of comparison (i.e. inter, intra and hybrid similarity) on the task of legal question-answering. Our results indicate that the context provided by CBR's case reuse enforces similarity between relevant components of the questions and the evidence base leading to significant improvements in the quality of generated answers.

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

TL;DR

Abstract

Paper Structure (17 sections, 8 equations, 5 figures, 4 tables)

This paper contains 17 sections, 8 equations, 5 figures, 4 tables.

Introduction
Related Work
CBR-RAG: Using CBR to form context for LLMs
Casebase
Representation and Similarity
Representation
Case Retrieval
Embedding models
BERT
LegalBERT Trained on General Legal Data
BERT with AnglE Embeddings
Dual-embedding Case Representation with AnglE
Evaluation
Legal QA Dataset Analysis
Retrieval Analysis
...and 2 more sections

Figures (5)

Figure 1: CBR-RAG
Figure 2: Ten most frequent legal acts in the casebase are listed on the left, and the legal act frequency distribution appears on the right.
Figure 3: Architecture and training process for BERT and AnglEBERT. Note that LegalBERT has the same architecture as BERT, but is pre-trained on legal text.
Figure 4: Cosine similarity distribution for intra- and inter-embeddings.
Figure 5: F1 score for Retrieval@k

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

TL;DR

Abstract

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (5)