Table of Contents
Fetching ...

Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education

Ali Forootani, Danial Esmaeili Aliabadi, Daniela Thraen

TL;DR

Bio-Eng-LMM addresses the need for accessible AI-assisted education and research by integrating Retrieval Augmented Generation (RAG) with multimodal capabilities. The system combines preprocessed documents, user-uploaded data, and real-time web data with image generation (Stable Diffusion) and image understanding (LLAVA), plus Whisper-based speech recognition and a Gradio GUI. Its modular RAG pipeline, multimodal processing, and web-enabled retrieval enable context-rich, up-to-date responses for cross-disciplinary learning and research tasks, including bioenergy domain applications. The work demonstrates practical impact by enabling interactive storytelling, data visualization, and classroom-assisted inquiry, with open-source code to foster replication and extension in educational and research settings.

Abstract

This article introduces Bio-Eng-LMM AI chatbot, a versatile platform designed to enhance user interaction for educational and research purposes. Leveraging cutting-edge open-source Large Language Models (LLMs), Bio-Eng-LMM operates as a sophisticated AI assistant, exploiting the capabilities of traditional models like ChatGPT. Central to Bio-Eng-LMM is its implementation of Retrieval Augmented Generation (RAG) through three primary methods: integration of preprocessed documents, real-time processing of user-uploaded files, and information retrieval from any specified website. Additionally, the chatbot incorporates image generation via a Stable Diffusion Model (SDM), image understanding and response generation through LLAVA, and search functionality on the internet powered by secure search engine such as DuckDuckGo. To provide comprehensive support, Bio-Eng-LMM offers text summarization, website content summarization, and both text and voice interaction. The chatbot maintains session memory to ensure contextually relevant and coherent responses. This integrated platform builds upon the strengths of RAG-GPT and Web-Based RAG Query (WBRQ) where the system fetches relevant information directly from the web to enhance the LLMs response generation.

Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education

TL;DR

Bio-Eng-LMM addresses the need for accessible AI-assisted education and research by integrating Retrieval Augmented Generation (RAG) with multimodal capabilities. The system combines preprocessed documents, user-uploaded data, and real-time web data with image generation (Stable Diffusion) and image understanding (LLAVA), plus Whisper-based speech recognition and a Gradio GUI. Its modular RAG pipeline, multimodal processing, and web-enabled retrieval enable context-rich, up-to-date responses for cross-disciplinary learning and research tasks, including bioenergy domain applications. The work demonstrates practical impact by enabling interactive storytelling, data visualization, and classroom-assisted inquiry, with open-source code to foster replication and extension in educational and research settings.

Abstract

This article introduces Bio-Eng-LMM AI chatbot, a versatile platform designed to enhance user interaction for educational and research purposes. Leveraging cutting-edge open-source Large Language Models (LLMs), Bio-Eng-LMM operates as a sophisticated AI assistant, exploiting the capabilities of traditional models like ChatGPT. Central to Bio-Eng-LMM is its implementation of Retrieval Augmented Generation (RAG) through three primary methods: integration of preprocessed documents, real-time processing of user-uploaded files, and information retrieval from any specified website. Additionally, the chatbot incorporates image generation via a Stable Diffusion Model (SDM), image understanding and response generation through LLAVA, and search functionality on the internet powered by secure search engine such as DuckDuckGo. To provide comprehensive support, Bio-Eng-LMM offers text summarization, website content summarization, and both text and voice interaction. The chatbot maintains session memory to ensure contextually relevant and coherent responses. This integrated platform builds upon the strengths of RAG-GPT and Web-Based RAG Query (WBRQ) where the system fetches relevant information directly from the web to enhance the LLMs response generation.
Paper Structure (45 sections, 8 figures)

This paper contains 45 sections, 8 figures.

Figures (8)

  • Figure 1: A typical RAG architecture functions as follows: The user submits queries in various formats, which are fed into both the retriever and the generator. The retriever extracts relevant information from data sources, and the generator uses this information to produce outputs in different formats.zhao2024retrieval.
  • Figure 2: Transformer models employ self-attention mechanisms to process input sequences in parallel, capturing long-range dependencies. Encoders generate contextual representations, while decoders generate output sequences. Key components include input/positional encoding, multi-head attention, feed-forward neural networks, and layer normalization.
  • Figure 3: Illustration of Multimodal chatbot with the capability of image understanding, image generation, document summary, question/answering from user and internet search, and automatic speech recognition.
  • Figure 4: Bio-Eng-LMMchatbot: responding to the user request for writing Python code.
  • Figure 5: Bio-Eng-LMMchatbot: RAG functionality on document summarization. Here a pdf document is uploaded and the user asks the summary of it.
  • ...and 3 more figures