SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Dayong Wu; Jiaqi Li; Baoxin Wang; Honghong Zhao; Siyuan Xue; Yanjie Yang; Zhijun Chang; Rui Zhang; Li Qian; Bo Wang; Shijin Wang; Zhixiong Zhang; Guoping Hu

SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, Guoping Hu

TL;DR

This work tackles the gap in LLMs for scientific literature services by building SciLit-LLM through continual pre-training and supervised fine-tuning on scholarly texts, based on the iFLYTEK Spark LLM. It then deploys SparkRA, a three-function knowledge service (literature investigation, paper reading, academic writing) that operates in English and Chinese. Experimental results show SparkRA outperforms baselines including GPT-3.5 and Llama3-8B across tasks, and even surpasses GPT-4 in paper-polishing quality, with strong translation and polishing metrics. The system demonstrates real-world impact, with over 50,000 registered users and over 1.3 million total interactions as of mid-2024.

Abstract

Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Assistant (SparkRA) based on our SciLit-LLM. SparkRA is accessible online and provides three primary functions: literature investigation, paper reading, and academic writing. As of July 30, 2024, SparkRA has garnered over 50,000 registered users, with a total usage count exceeding 1.3 million.

SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

TL;DR

Abstract

Paper Structure (26 sections, 4 figures, 3 tables)

This paper contains 26 sections, 4 figures, 3 tables.

Introduction
Scientific Literature LLM
Base model
Continual pre-training
Data preparation.
Pre-training.
Supervised fine-tuning
Data preparation.
Training.
SparkRA
Literature investigation
Investigation copilot.
Topic search engine.
Review generation.
Paper reading
...and 11 more sections

Figures (4)

Figure 1: The process of building SparkRA system.
Figure 2: The system architecture of SparkRA integrates iFLYTEK Spark LLM and Scientific Literature LLM to facilitate literature investigation, paper reading, and academic writing.
Figure 3: The architecture of RAG-based literature investigation.
Figure 4: Literature investigation page.

SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

TL;DR

Abstract

SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (4)