Towards a RAG-based Summarization Agent for the Electron-Ion Collider

Karthik Suresh; Neeltje Kackar; Luke Schleck; Cristiano Fanelli

Towards a RAG-based Summarization Agent for the Electron-Ion Collider

Karthik Suresh, Neeltje Kackar, Luke Schleck, Cristiano Fanelli

TL;DR

This paper presents RAGS4EIC, a RAG-based summarization agent tailored for the Electron Ion Collider to alleviate information overload by grounding LLM-generated summaries in a curated knowledge base. It details a two-step pipeline: indexing a vectorized knowledge base (via ingestion of EIC arXiv content) and generating concise, citation-rich outputs with an LLM within a LangChain framework. The authors develop and evaluate synthetic benchmark datasets using GPT-4, establish a performance suite with standard metrics and RAG-specific assessments (RAGAs), and report promising results with low hallucination rates and strong claim recognition. A web demonstration showcases end-to-end workflow and emphasizes the value of domain-specific data curation and prompt-tuning for scalable, trustworthy information access in the EIC community.

Abstract

The complexity and sheer volume of information encompassing documents, papers, data, and other resources from large-scale experiments demand significant time and effort to navigate, making the task of accessing and utilizing these varied forms of information daunting, particularly for new collaborators and early-career scientists. To tackle this issue, a Retrieval Augmented Generation (RAG)--based Summarization AI for EIC (RAGS4EIC) is under development. This AI-Agent not only condenses information but also effectively references relevant responses, offering substantial advantages for collaborators. Our project involves a two-step approach: first, querying a comprehensive vector database containing all pertinent experiment information; second, utilizing a Large Language Model (LLM) to generate concise summaries enriched with citations based on user queries and retrieved data. We describe the evaluation methods that use RAG assessments (RAGAs) scoring mechanisms to assess the effectiveness of responses. Furthermore, we describe the concept of prompt template-based instruction-tuning which provides flexibility and accuracy in summarization. Importantly, the implementation relies on LangChain, which serves as the foundation of our entire workflow. This integration ensures efficiency and scalability, facilitating smooth deployment and accessibility for various user groups within the Electron Ion Collider (EIC) community. This innovative AI-driven framework not only simplifies the understanding of vast datasets but also encourages collaborative participation, thereby empowering researchers. As a demonstration, a web application has been developed to explain each stage of the RAG Agent development in detail.

Towards a RAG-based Summarization Agent for the Electron-Ion Collider

TL;DR

Abstract

Paper Structure (14 sections, 5 figures, 3 tables)

This paper contains 14 sections, 5 figures, 3 tables.

Background
The Electron Ion Collider (EIC):
Fine tuning of Large Language Models (LLMs):
Retrieval Augmented Generation pipeline
Creation of Knowledge base
EIC arXiv dataset:
Ingestion:
Inference
Evaluation of the EIC-RAG
LLM-assisted creation of benchmark datasets:
Performance of RAG agent on standard metrics:
LLM-based metrics for evaluation - RAGAs:
Conclusions
Appendix

Figures (5)

Figure 1: A Naive RAG Agent involves the user inputting a prompt to the agent. The agent leverages an external knowledge base to gather more details related to the query. The information is enhanced using a predefined response template, which is fed in as an input to a Frozen LLM such as OpenAI's GPT, Anthropic's Claude or Meta's LLaMA2 for summarization of the result. The RAG Agent operates within a LLM engineering platform such as LangChain, LangFuse or LlamaIndex frameworks.
Figure 2: Creating a knowledge base involves ingesting data in varied formats. EIC data, often unstructured yet tagged (e.g., wikis, run logs), includes untagged PDFs. Despite Optical Character Recognition (OCR) and Deep Learning models for text conversion, extracting figures and images from PDFs remains challenging, complicating the development of a multi-modal pipeline.
Figure 3: When a user submits a question and selects both the metric and search configuration for vector searching, the query first passes through a decision chain. The LLM evaluates whether to consult the knowledge base for the answer. If more information is needed, it searches the vector database for additional context and sources. This information is processed through a fine-tuned template to craft a response with citations. The LLM then ensures syntax accuracy, delivering a GitHub markdown formatted answer. ChatGPT was used for this workflow with GPT-3.5 (gpt-3.5-turbo-1106) as the LLM running on the backend.
Figure 4: An example of creating an LLM-assisted Question & Answer dataset. Within a web application, an "annotator" requests a question with three claims, generating a detailed JSON object that includes the number of claims, each claim, the ideal response for each claim, and a comprehensive response incorporating all claims. This process yields a rich, high-quality dataset for evaluation, enabling even newcomers to generate datasets with LLM assistance. The "annotator" can create multiple question-answer pairs and select specific ones for the dataset.
Figure 5: The illustration outlines the RAGS4EIC inference process: (a) shows a user query and the Agent's decision on accessing the knowledge base, proceeding as detailed in Fig. \ref{['fig:rag-inference-pipeline']}, and culminating in the output with a tracking link. (b) displays the query's chronological record in Langsmith, with logs maintained for up to 14 days before archival for possible future retrieval.

Towards a RAG-based Summarization Agent for the Electron-Ion Collider

TL;DR

Abstract

Towards a RAG-based Summarization Agent for the Electron-Ion Collider

Authors

TL;DR

Abstract

Table of Contents

Figures (5)