A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models

Vansh Sharma; Venkat Raman

A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models

Vansh Sharma, Venkat Raman

TL;DR

This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study, and introduces a custom workflow developed with a detection algorithm to filter out inaccuracies.

Abstract

This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future deliberations. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization.

A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 11 figures, 2 tables)

This paper contains 17 sections, 1 equation, 11 figures, 2 tables.

Introduction
Method
Document Ingestion and Database Creation
Querying and generative agent
Optimal User Prompts
Results
Effect of Chunk Size
Effect of Chunk Overlap
Demonstration of Knowledge Extraction
Vanilla Model vs Framework based Model
Optimized Prompts Strategies
Beyond Optimized Prompts
Conclusions
Acknowledgments
Literature for ODW Database
...and 2 more sections

Figures (11)

Figure 1: Radar diagram comparing different LLM optimization strategies for information extraction task. The current work focuses on RAG integrated with prompt engineering as the strategy for adapting language models to specific science domains.
Figure 2: Process workflow for information retrieval and querying.
Figure 3: Embedding documents using multi-processing framework to persist in a database.
Figure 4: Workflow for generative answering process.
Figure 5: Optimal prompt stencils structures for knowledge extraction.
...and 6 more figures

A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models

TL;DR

Abstract

A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)