RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization
Jinhu Qi, Shuai Yan, Yibo Zhang, Wentao Zhang, Rong Jin, Yuwei Hu, Ke Wang
TL;DR
The paper tackles the problem of delivering accurate, personalized Tibetan cultural tourism recommendations while mitigating LLM hallucinations. It introduces a retrieval-augmented generation framework built on a 563-viewpoint tourist database derived from Ctrip and Wikipedia, employing vectorization with $w_{x,y}=tf_{x,y}\cdot\log\left(\frac{N}{df_x}\right)$ for TF-IDF and comparing TF-IDF against BERT. Through extensive evaluation, TF-IDF with HNSWFlat and L2 distance emerges as the optimal retrieval configuration, and integrating external knowledge via RAG improves fluency, accuracy, and relevance in generated content; notably, Llama3-8b achieves substantial gains in relevance (0.674 → 0.972) and overall performance (≈92.25%). The study demonstrates RAG’s potential to standardize cultural tourism information and empower intelligent service systems, while highlighting the balance between parametric memory and non-parametric knowledge sources. These findings offer a practical foundation for deploying robust, context-aware Tibetan tourism assistants and guide future cross-domain applications of RAG in specialized cultural domains.
Abstract
With the development of the modern social economy, tourism has become an important way to meet people's spiritual needs, bringing development opportunities to the tourism industry. However, existing large language models (LLMs) face challenges in personalized recommendation capabilities and the generation of content that can sometimes produce hallucinations. This study proposes an optimization scheme for Tibet tourism LLMs based on retrieval-augmented generation (RAG) technology. By constructing a database of tourist viewpoints and processing the data using vectorization techniques, we have significantly improved retrieval accuracy. The application of RAG technology effectively addresses the hallucination problem in content generation. The optimized model shows significant improvements in fluency, accuracy, and relevance of content generation. This research demonstrates the potential of RAG technology in the standardization of cultural tourism information and data analysis, providing theoretical and technical support for the development of intelligent cultural tourism service systems.
