Table of Contents
Fetching ...

Disrupt Your Research Using Generative AI Powered ScienceSage

Yong Zhang, Eric Herrison Gyamfi, Kelly Anderson, Sasha Roberts, Matt Barker

TL;DR

The paper tackles the challenge of keeping GenAI-assisted scientific research up-to-date and well-grounded by proposing ScienceSage, an MVP web app that builds and queries cross-modal knowledge bases. It fuses vector embeddings and knowledge graphs to support three interrelated functions: generating structured research reports, chatting with documents, and multimodal data conversations. Through a comprehensive RAG evaluation, it demonstrates that a hybrid Custom Index consistently yields higher correctness, relevance, and faithfulness than purely vector or KG approaches, enabling faster, more reliable scientific insights. The work emphasizes practical deployment, integration of multiple data modalities, and future work on scaling, graph analytics, and safety, signaling a path toward robust, real-time scientific knowledge management in industry settings.

Abstract

Large Language Models (LLM) are disrupting science and research in different subjects and industries. Here we report a minimum-viable-product (MVP) web application called $\textbf{ScienceSage}$. It leverages generative artificial intelligence (GenAI) to help researchers disrupt the speed, magnitude and scope of product innovation. $\textbf{ScienceSage}$ enables researchers to build, store, update and query a knowledge base (KB). A KB codifies user's knowledge/information of a given domain in both vector index and knowledge graph (KG) index for efficient information retrieval and query. The knowledge/information can be extracted from user's textual documents, images, videos, audios and/or the research reports generated based on a research question and the latest relevant information on internet. The same set of KBs interconnect three functions on $\textbf{ScienceSage}$: 'Generate Research Report', 'Chat With Your Documents' and 'Chat With Anything'. We share our learning to encourage discussion and improvement of GenAI's role in scientific research.

Disrupt Your Research Using Generative AI Powered ScienceSage

TL;DR

The paper tackles the challenge of keeping GenAI-assisted scientific research up-to-date and well-grounded by proposing ScienceSage, an MVP web app that builds and queries cross-modal knowledge bases. It fuses vector embeddings and knowledge graphs to support three interrelated functions: generating structured research reports, chatting with documents, and multimodal data conversations. Through a comprehensive RAG evaluation, it demonstrates that a hybrid Custom Index consistently yields higher correctness, relevance, and faithfulness than purely vector or KG approaches, enabling faster, more reliable scientific insights. The work emphasizes practical deployment, integration of multiple data modalities, and future work on scaling, graph analytics, and safety, signaling a path toward robust, real-time scientific knowledge management in industry settings.

Abstract

Large Language Models (LLM) are disrupting science and research in different subjects and industries. Here we report a minimum-viable-product (MVP) web application called . It leverages generative artificial intelligence (GenAI) to help researchers disrupt the speed, magnitude and scope of product innovation. enables researchers to build, store, update and query a knowledge base (KB). A KB codifies user's knowledge/information of a given domain in both vector index and knowledge graph (KG) index for efficient information retrieval and query. The knowledge/information can be extracted from user's textual documents, images, videos, audios and/or the research reports generated based on a research question and the latest relevant information on internet. The same set of KBs interconnect three functions on : 'Generate Research Report', 'Chat With Your Documents' and 'Chat With Anything'. We share our learning to encourage discussion and improvement of GenAI's role in scientific research.

Paper Structure

This paper contains 19 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: ScienceSage web application
  • Figure 2: Architecture of ScienceSage
  • Figure 3: The average quality of response based on (a) correctness, (b) relevance and (c) faithfulness for easy, medium and hard queries. Each difficulty level has 765 queries.
  • Figure A1: The average quality of response based on (a) correctness, (b) relevance and (c) faithfulness for all queries. There are 2295 queries.
  • Figure A2: The average quality of response based on (a) correctness, (b) relevance and (c) faithfulness for easy, medium and hard queries with less keywords in the queries. Each difficulty level has 255 queries.
  • ...and 5 more figures