SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model
Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, Guoping Hu
TL;DR
This work tackles the gap in LLMs for scientific literature services by building SciLit-LLM through continual pre-training and supervised fine-tuning on scholarly texts, based on the iFLYTEK Spark LLM. It then deploys SparkRA, a three-function knowledge service (literature investigation, paper reading, academic writing) that operates in English and Chinese. Experimental results show SparkRA outperforms baselines including GPT-3.5 and Llama3-8B across tasks, and even surpasses GPT-4 in paper-polishing quality, with strong translation and polishing metrics. The system demonstrates real-world impact, with over 50,000 registered users and over 1.3 million total interactions as of mid-2024.
Abstract
Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Assistant (SparkRA) based on our SciLit-LLM. SparkRA is accessible online and provides three primary functions: literature investigation, paper reading, and academic writing. As of July 30, 2024, SparkRA has garnered over 50,000 registered users, with a total usage count exceeding 1.3 million.
