Table of Contents
Fetching ...

Long-Context Long-Form Question Answering for Legal Domain

Anagha Kulkarni, Parin Rajesh Jhaveri, Prasha Shrestha, Yu Tong Han, Reza Amini, Behrouz Madahian

TL;DR

This work tackles the challenge of long-context, long-form question answering in the legal domain by introducing LCLF-QA, a system that combines domain-specific query rewriting, layout-aware chunking, and a domain-aware generator. Built on and extending LongRAG with a long-context extractor and CoT-based filtering, the approach aims to preserve structural cues from legal documents and interpret specialized vocabulary. A curated QA dataset (546 pairs, including SME-sourced and synthetic examples) and a novel coverage metric are used to evaluate performance, showing statistically significant gains over vanilla RAG and LongRAG baselines. The results demonstrate improved recall and coverage, driven by the three core components, and reveal practical limitations and future directions for applying the approach to broader domains.

Abstract

Legal documents have complex document layouts involving multiple nested sections, lengthy footnotes and further use specialized linguistic devices like intricate syntax and domain-specific vocabulary to ensure precision and authority. These inherent characteristics of legal documents make question answering challenging, and particularly so when the answer to the question spans several pages (i.e. requires long-context) and is required to be comprehensive (i.e. a long-form answer). In this paper, we address the challenges of long-context question answering in context of long-form answers given the idiosyncrasies of legal documents. We propose a question answering system that can (a) deconstruct domain-specific vocabulary for better retrieval from source documents, (b) parse complex document layouts while isolating sections and footnotes and linking them appropriately, (c) generate comprehensive answers using precise domain-specific vocabulary. We also introduce a coverage metric that classifies the performance into recall-based coverage categories allowing human users to evaluate the recall with ease. We curate a QA dataset by leveraging the expertise of professionals from fields such as law and corporate tax. Through comprehensive experiments and ablation studies, we demonstrate the usability and merit of the proposed system.

Long-Context Long-Form Question Answering for Legal Domain

TL;DR

This work tackles the challenge of long-context, long-form question answering in the legal domain by introducing LCLF-QA, a system that combines domain-specific query rewriting, layout-aware chunking, and a domain-aware generator. Built on and extending LongRAG with a long-context extractor and CoT-based filtering, the approach aims to preserve structural cues from legal documents and interpret specialized vocabulary. A curated QA dataset (546 pairs, including SME-sourced and synthetic examples) and a novel coverage metric are used to evaluate performance, showing statistically significant gains over vanilla RAG and LongRAG baselines. The results demonstrate improved recall and coverage, driven by the three core components, and reveal practical limitations and future directions for applying the approach to broader domains.

Abstract

Legal documents have complex document layouts involving multiple nested sections, lengthy footnotes and further use specialized linguistic devices like intricate syntax and domain-specific vocabulary to ensure precision and authority. These inherent characteristics of legal documents make question answering challenging, and particularly so when the answer to the question spans several pages (i.e. requires long-context) and is required to be comprehensive (i.e. a long-form answer). In this paper, we address the challenges of long-context question answering in context of long-form answers given the idiosyncrasies of legal documents. We propose a question answering system that can (a) deconstruct domain-specific vocabulary for better retrieval from source documents, (b) parse complex document layouts while isolating sections and footnotes and linking them appropriately, (c) generate comprehensive answers using precise domain-specific vocabulary. We also introduce a coverage metric that classifies the performance into recall-based coverage categories allowing human users to evaluate the recall with ease. We curate a QA dataset by leveraging the expertise of professionals from fields such as law and corporate tax. Through comprehensive experiments and ablation studies, we demonstrate the usability and merit of the proposed system.
Paper Structure (29 sections, 1 equation, 20 figures, 2 tables)

This paper contains 29 sections, 1 equation, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Overview of LCLF-QA ingestion: Layouts of legal documents are parsed into page headers, page footers, sections, and footnotes. Page headers and footers are filtered out, sections and footnotes are used to create parent and child chunks: (a) Each reasonably sized section becomes a parent chunk, and is divided into child chunks of appropriate lengths. (b) Footnotes on a page are grouped as a child chunk, and linked to parent chunks on that page.
  • Figure 2: Overview of LCLF-QA inference: Domain-specific query rewriter provides effective retrieval by reducing query ambiguity. The query and its rewrites retrieve relevant child chunks. During semantic expansion, retrieved footnote chunks are linked to parent chunks and parents are injected with footnote content. Retrieved child chunks are sent to CoT filter, parent chunks to extractor. Their outputs are used by domain-specific reader to form an answer.
  • Figure 3: Domain-Specific Query Rewriter
  • Figure 4: Retrieved footnote-based child chunks are used to retrieve the linked section-based parent chunks. These section-based parent chunks go through footnote enrichment, where the footnote is appended using tags.
  • Figure 5: Prompt for the long context extractor
  • ...and 15 more figures