NANOGPT: A Query-Driven Large Language Model Retrieval-Augmented Generation System for Nanotechnology Research
Achuth Chandrasekhar, Omid Barati Farimani, Olabode T. Ajenifujah, Janghoon Ock, Amir Barati Farimani
TL;DR
NANOGPT introduces a query-driven LLM-RAG system tailored for nanotechnology literature, leveraging real-time, multi-source retrieval to ground generation in verified science. The approach uses LLaMA3.1-8B-Instruct with a dual-embedding retrieval backbone and MPNet-based contextual embeddings to deliver semantically relevant documents, integrated via a Streamlit chat interface. Evaluation by domain experts shows NANOGPT achieves higher depth and factual accuracy than vanilla LLMs, though lay explanations from non-RAG models can be more accessible. The work demonstrates the practical impact of AI-assisted literature retrieval for accelerating nanotechnology research and outlines future directions, including dynamic data updates and integration with ontological knowledge graphs.
Abstract
This paper presents the development and application of a Large Language Model Retrieval-Augmented Generation (LLM-RAG) system tailored for nanotechnology research. The system leverages the capabilities of a sophisticated language model to serve as an intelligent research assistant, enhancing the efficiency and comprehensiveness of literature reviews in the nanotechnology domain. Central to this LLM-RAG system is its advanced query backend retrieval mechanism, which integrates data from multiple reputable sources. The system retrieves relevant literature by utilizing Google Scholar's advanced search, and scraping open-access papers from Elsevier, Springer Nature, and ACS Publications. This multifaceted approach ensures a broad and diverse collection of up-to-date scholarly articles and papers. The proposed system demonstrates significant potential in aiding researchers by providing a streamlined, accurate, and exhaustive literature retrieval process, thereby accelerating research advancements in nanotechnology. The effectiveness of the LLM-RAG system is validated through rigorous testing, illustrating its capability to significantly reduce the time and effort required for comprehensive literature reviews, while maintaining high accuracy, query relevance and outperforming standard, publicly available LLMS.
