Table of Contents
Fetching ...

NANOGPT: A Query-Driven Large Language Model Retrieval-Augmented Generation System for Nanotechnology Research

Achuth Chandrasekhar, Omid Barati Farimani, Olabode T. Ajenifujah, Janghoon Ock, Amir Barati Farimani

TL;DR

NANOGPT introduces a query-driven LLM-RAG system tailored for nanotechnology literature, leveraging real-time, multi-source retrieval to ground generation in verified science. The approach uses LLaMA3.1-8B-Instruct with a dual-embedding retrieval backbone and MPNet-based contextual embeddings to deliver semantically relevant documents, integrated via a Streamlit chat interface. Evaluation by domain experts shows NANOGPT achieves higher depth and factual accuracy than vanilla LLMs, though lay explanations from non-RAG models can be more accessible. The work demonstrates the practical impact of AI-assisted literature retrieval for accelerating nanotechnology research and outlines future directions, including dynamic data updates and integration with ontological knowledge graphs.

Abstract

This paper presents the development and application of a Large Language Model Retrieval-Augmented Generation (LLM-RAG) system tailored for nanotechnology research. The system leverages the capabilities of a sophisticated language model to serve as an intelligent research assistant, enhancing the efficiency and comprehensiveness of literature reviews in the nanotechnology domain. Central to this LLM-RAG system is its advanced query backend retrieval mechanism, which integrates data from multiple reputable sources. The system retrieves relevant literature by utilizing Google Scholar's advanced search, and scraping open-access papers from Elsevier, Springer Nature, and ACS Publications. This multifaceted approach ensures a broad and diverse collection of up-to-date scholarly articles and papers. The proposed system demonstrates significant potential in aiding researchers by providing a streamlined, accurate, and exhaustive literature retrieval process, thereby accelerating research advancements in nanotechnology. The effectiveness of the LLM-RAG system is validated through rigorous testing, illustrating its capability to significantly reduce the time and effort required for comprehensive literature reviews, while maintaining high accuracy, query relevance and outperforming standard, publicly available LLMS.

NANOGPT: A Query-Driven Large Language Model Retrieval-Augmented Generation System for Nanotechnology Research

TL;DR

NANOGPT introduces a query-driven LLM-RAG system tailored for nanotechnology literature, leveraging real-time, multi-source retrieval to ground generation in verified science. The approach uses LLaMA3.1-8B-Instruct with a dual-embedding retrieval backbone and MPNet-based contextual embeddings to deliver semantically relevant documents, integrated via a Streamlit chat interface. Evaluation by domain experts shows NANOGPT achieves higher depth and factual accuracy than vanilla LLMs, though lay explanations from non-RAG models can be more accessible. The work demonstrates the practical impact of AI-assisted literature retrieval for accelerating nanotechnology research and outlines future directions, including dynamic data updates and integration with ontological knowledge graphs.

Abstract

This paper presents the development and application of a Large Language Model Retrieval-Augmented Generation (LLM-RAG) system tailored for nanotechnology research. The system leverages the capabilities of a sophisticated language model to serve as an intelligent research assistant, enhancing the efficiency and comprehensiveness of literature reviews in the nanotechnology domain. Central to this LLM-RAG system is its advanced query backend retrieval mechanism, which integrates data from multiple reputable sources. The system retrieves relevant literature by utilizing Google Scholar's advanced search, and scraping open-access papers from Elsevier, Springer Nature, and ACS Publications. This multifaceted approach ensures a broad and diverse collection of up-to-date scholarly articles and papers. The proposed system demonstrates significant potential in aiding researchers by providing a streamlined, accurate, and exhaustive literature retrieval process, thereby accelerating research advancements in nanotechnology. The effectiveness of the LLM-RAG system is validated through rigorous testing, illustrating its capability to significantly reduce the time and effort required for comprehensive literature reviews, while maintaining high accuracy, query relevance and outperforming standard, publicly available LLMS.

Paper Structure

This paper contains 21 sections, 4 figures.

Figures (4)

  • Figure 1: RAG Mechanism - the process of querying a database using an embedding model to provide context to an LLM, which in turn generates an answer.
  • Figure 2: Embedding Model for Multi-Modal Data. This diagram illustrates how an embedding model transforms various input types: images, text, and audio into numerical vector representations. Each data type is fed into the neural network, which generates unique vectors capturing essential features for further analysis or comparison. The vectors consist of numerical values representing the high-dimensional features encoded from the original input.
  • Figure 3: Streamlit-based Chat Interface for NANOGPT
  • Figure 4: Effects of slicing the surfaces on surface area