Table of Contents
Fetching ...

IntellectSeeker: A Personalized Literature Management System with the Probabilistic Model and Large Language Model

Weizhen Bian, Siyan Liu, Yubo Zhou, Dezhi Chen, Yijie Liao, Zhenzhen Fan, Aobo Wang

TL;DR

IntellectSeeker tackles the challenge of finding relevant literature amid explosive growth in scholarly content by coupling a probabilistic data crawling model with a semantically enhanced large language model. The system architecture comprises Probabilistic Data Scraping, an Enhanced Search Engine, and Interactive Data Exploration with personalized recommendations, enabling precise matching of user needs and behavioral signals. A fine-tuned GPT-3.5-turbo model is shown to outperform several open-source LLMs in translating everyday language into academic terminology, supporting reliable term substitutions across disciplines. Automatic summarization and word-cloud visualizations further accelerate literature screening, while a multi-component recommendation engine balances explicit and implicit user preferences. The work demonstrates practical improvements in search precision and user experience, with future work focusing on richer user profiling and expanding scholarly services.

Abstract

Faced with the burgeoning volume of academic literature, researchers often need help with uncertain article quality and mismatches in term searches using traditional academic engines. We introduce IntellectSeeker, an innovative and personalized intelligent academic literature management platform to address these challenges. This platform integrates a Large Language Model (LLM)--based semantic enhancement bot with a sophisticated probability model to personalize and streamline literature searches. We adopted the GPT-3.5-turbo model to transform everyday language into professional academic terms across various scenarios using multiple rounds of few-shot learning. This adaptation mainly benefits academic newcomers, effectively bridging the gap between general inquiries and academic terminology. The probabilistic model intelligently filters academic articles to align closely with the specific interests of users, which are derived from explicit needs and behavioral patterns. Moreover, IntellectSeeker incorporates an advanced recommendation system and text compression tools. These features enable intelligent article recommendations based on user interactions and present search results through concise one-line summaries and innovative word cloud visualizations, significantly enhancing research efficiency and user experience. IntellectSeeker offers academic researchers a highly customizable literature management solution with exceptional search precision and matching capabilities. The code can be found here: https://github.com/LuckyBian/ISY5001

IntellectSeeker: A Personalized Literature Management System with the Probabilistic Model and Large Language Model

TL;DR

IntellectSeeker tackles the challenge of finding relevant literature amid explosive growth in scholarly content by coupling a probabilistic data crawling model with a semantically enhanced large language model. The system architecture comprises Probabilistic Data Scraping, an Enhanced Search Engine, and Interactive Data Exploration with personalized recommendations, enabling precise matching of user needs and behavioral signals. A fine-tuned GPT-3.5-turbo model is shown to outperform several open-source LLMs in translating everyday language into academic terminology, supporting reliable term substitutions across disciplines. Automatic summarization and word-cloud visualizations further accelerate literature screening, while a multi-component recommendation engine balances explicit and implicit user preferences. The work demonstrates practical improvements in search precision and user experience, with future work focusing on richer user profiling and expanding scholarly services.

Abstract

Faced with the burgeoning volume of academic literature, researchers often need help with uncertain article quality and mismatches in term searches using traditional academic engines. We introduce IntellectSeeker, an innovative and personalized intelligent academic literature management platform to address these challenges. This platform integrates a Large Language Model (LLM)--based semantic enhancement bot with a sophisticated probability model to personalize and streamline literature searches. We adopted the GPT-3.5-turbo model to transform everyday language into professional academic terms across various scenarios using multiple rounds of few-shot learning. This adaptation mainly benefits academic newcomers, effectively bridging the gap between general inquiries and academic terminology. The probabilistic model intelligently filters academic articles to align closely with the specific interests of users, which are derived from explicit needs and behavioral patterns. Moreover, IntellectSeeker incorporates an advanced recommendation system and text compression tools. These features enable intelligent article recommendations based on user interactions and present search results through concise one-line summaries and innovative word cloud visualizations, significantly enhancing research efficiency and user experience. IntellectSeeker offers academic researchers a highly customizable literature management solution with exceptional search precision and matching capabilities. The code can be found here: https://github.com/LuckyBian/ISY5001

Paper Structure

This paper contains 16 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of IntellectSeeker: IntellectSeeker features three core components: Probabilistic Data Scraping, Enhanced Search Engine, and Interactive Data Exploration and Personalized Recommendation. It uses probabilistic algorithms to filter web data that are closely aligned with user preferences. The search engine dynamically adjusts search results based on user queries and preferences, incorporating popular and personalized suggestions. The interactive module uses a text mining model and an LLM-based chatbot to deliver customized recommendations and visual summaries like word clouds, enhancing the user experience through tailored content.
  • Figure 2: IntellectSeeker's data crawling process: This diagram illustrates IntellectSeeker's data crawling process, divided into three key stages. First, user input guides the system in extracting relevant data from the target website using a probabilistic model. Subsequently, the extracted data is evaluated against predefined thresholds to determine its relevance. Finally, it is stored through pointers to ensure storage space conservation.
  • Figure 3: Overview of LLM Search Engine: it outlines a Large Language Model's training and deployment workflow. During the 'Training Phase,' a new corpus is created and used to fine-tune a pre-trained large language model. Once validated, the model is deployed to handle user queries, recognizing and processing inquiries into various formats. Key components include the 'Judgment Box' for response evaluation, 'Actions' guiding system responses, 'LLM Engines' for processing, 'Algorithms' for enhanced functionality, and the 'Corpus' as the training dataset.