Table of Contents
Fetching ...

1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering

Jebish Purbey, Drishti Sharma, Siddhant Gupta, Khawaja Murad, Siddartha Pullakhandam, Ram Mohan Rao Kadiyala

TL;DR

Problem: improve retrieval and answer generation for regulatory information in complex documents. Approach: a lexical-semantic hybrid LeSeR retriever that decouples dense semantic retrieval from lexical BM25 reranking, trained on ObliQA for ADGM regulations. Findings: LeSeR improves retrieval metrics to Recall@10 $0.8201$ and mAP@10 $0.6655$, and, with Qwen2.5 7B, achieves top generation quality (RePASs $0.4340$) using retrieved passages. Significance: demonstrates effective retrieval-augmented generation for regulatory compliance workflows and points to domain-adaptation and robustness enhancements as future work.

Abstract

This paper presents the system description of our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge, focusing on leveraging advanced information retrieval and answer generation techniques in regulatory domains. We experimented with a combination of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged fine-tuning and reranking for retrieving relevant documents in top ranks. We utilized a novel approach, LeSeR, which achieved competitive results with a recall@10 of 0.8201 and map@10 of 0.6655 for retrievals. This work highlights the transformative potential of natural language processing techniques in regulatory applications, offering insights into their capabilities for implementing a retrieval augmented generation system while identifying areas for future improvement in robustness and domain adaptation.

1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering

TL;DR

Problem: improve retrieval and answer generation for regulatory information in complex documents. Approach: a lexical-semantic hybrid LeSeR retriever that decouples dense semantic retrieval from lexical BM25 reranking, trained on ObliQA for ADGM regulations. Findings: LeSeR improves retrieval metrics to Recall@10 and mAP@10 , and, with Qwen2.5 7B, achieves top generation quality (RePASs ) using retrieved passages. Significance: demonstrates effective retrieval-augmented generation for regulatory compliance workflows and points to domain-adaptation and robustness enhancements as future work.

Abstract

This paper presents the system description of our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge, focusing on leveraging advanced information retrieval and answer generation techniques in regulatory domains. We experimented with a combination of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged fine-tuning and reranking for retrieving relevant documents in top ranks. We utilized a novel approach, LeSeR, which achieved competitive results with a recall@10 of 0.8201 and map@10 of 0.6655 for retrievals. This work highlights the transformative potential of natural language processing techniques in regulatory applications, offering insights into their capabilities for implementing a retrieval augmented generation system while identifying areas for future improvement in robustness and domain adaptation.

Paper Structure

This paper contains 5 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: System design workflow