Table of Contents
Fetching ...

From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process

Jaewoong Kim, Moohong Min

TL;DR

The paper tackles the challenge of navigating extensive pharmaceutical regulatory guidelines by introducing QA-RAG, a retrieval-augmented generation system that tightly couples guideline retrieval with synthetic question-answer generation. It uses a dual-track retrieval strategy that incorporates both the user query and a fine-tuned LLM-generated hypothetical answer, followed by a BGE reranker and few-shot prompted final answers, all evaluated against FDA/ICH-themed datasets. The results show that QA-RAG achieves higher context precision and recall and superior final-answer quality compared with baselines, highlighting the value of integrating domain-tuned retrieval and answer-driven document selection. The work demonstrates significant implications for streamlining regulatory compliance, reducing dependence on specialized human expertise, and extending the approach to other high-stakes, domain-specific settings, with public release of data and methods.

Abstract

Regulatory compliance in the pharmaceutical industry entails navigating through complex and voluminous guidelines, often requiring significant human resources. To address these challenges, our study introduces a chatbot model that utilizes generative AI and the Retrieval Augmented Generation (RAG) method. This chatbot is designed to search for guideline documents relevant to the user inquiries and provide answers based on the retrieved guidelines. Recognizing the inherent need for high reliability in this domain, we propose the Question and Answer Retrieval Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model demonstrated a significant improvement in accuracy, outperforming all other baselines including conventional RAG methods. This paper details QA-RAG's structure and performance evaluation, emphasizing its potential for the regulatory compliance domain in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.

From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process

TL;DR

The paper tackles the challenge of navigating extensive pharmaceutical regulatory guidelines by introducing QA-RAG, a retrieval-augmented generation system that tightly couples guideline retrieval with synthetic question-answer generation. It uses a dual-track retrieval strategy that incorporates both the user query and a fine-tuned LLM-generated hypothetical answer, followed by a BGE reranker and few-shot prompted final answers, all evaluated against FDA/ICH-themed datasets. The results show that QA-RAG achieves higher context precision and recall and superior final-answer quality compared with baselines, highlighting the value of integrating domain-tuned retrieval and answer-driven document selection. The work demonstrates significant implications for streamlining regulatory compliance, reducing dependence on specialized human expertise, and extending the approach to other high-stakes, domain-specific settings, with public release of data and methods.

Abstract

Regulatory compliance in the pharmaceutical industry entails navigating through complex and voluminous guidelines, often requiring significant human resources. To address these challenges, our study introduces a chatbot model that utilizes generative AI and the Retrieval Augmented Generation (RAG) method. This chatbot is designed to search for guideline documents relevant to the user inquiries and provide answers based on the retrieved guidelines. Recognizing the inherent need for high reliability in this domain, we propose the Question and Answer Retrieval Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model demonstrated a significant improvement in accuracy, outperforming all other baselines including conventional RAG methods. This paper details QA-RAG's structure and performance evaluation, emphasizing its potential for the regulatory compliance domain in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.
Paper Structure (44 sections, 3 equations, 2 figures, 5 tables)