Table of Contents
Fetching ...

A Preliminary Study of RAG for Taiwanese Historical Archives

Claire Lin, Bo-Han Feng, Xuanjun Chen, Te-Lun Yang, Hung-yi Lee, Jyh-Shing Roger Jang

TL;DR

This study adapts a Retrieval-Augmented Generation pipeline to two Traditional Chinese historical corpora from Taiwan, Fort Zeelandia and the Taiwan Provincial Council Gazette, each with rich query- and document-level metadata. It systematically compares sparse, dense, and hybrid retrieval, plus four metadata strategies, and uses GPT-4o for answer generation and Gemini-2.5-Pro for evaluation across groundedness, relevance, and hallucination. The results show that integrating metadata at the retrieval stage boosts recall and grounding while reducing, but not eliminating, hallucinations, especially for temporal and multi-hop questions. The work provides a disciplined methodology and public datasets that advance humanities-focused RAG research and highlight practical considerations for applying RAG to historical, non-English archives.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a promising approach for knowledge-intensive tasks. However, few studies have examined RAG for Taiwanese Historical Archives. In this paper, we present an initial study of a RAG pipeline applied to two historical Traditional Chinese datasets, Fort Zeelandia and the Taiwan Provincial Council Gazette, along with their corresponding open-ended query sets. We systematically investigate the effects of query characteristics and metadata integration strategies on retrieval quality, answer generation, and the performance of the overall system. The results show that early-stage metadata integration enhances both retrieval and answer accuracy while also revealing persistent challenges for RAG systems, including hallucinations during generation and difficulties in handling temporal or multi-hop historical queries.

A Preliminary Study of RAG for Taiwanese Historical Archives

TL;DR

This study adapts a Retrieval-Augmented Generation pipeline to two Traditional Chinese historical corpora from Taiwan, Fort Zeelandia and the Taiwan Provincial Council Gazette, each with rich query- and document-level metadata. It systematically compares sparse, dense, and hybrid retrieval, plus four metadata strategies, and uses GPT-4o for answer generation and Gemini-2.5-Pro for evaluation across groundedness, relevance, and hallucination. The results show that integrating metadata at the retrieval stage boosts recall and grounding while reducing, but not eliminating, hallucinations, especially for temporal and multi-hop questions. The work provides a disciplined methodology and public datasets that advance humanities-focused RAG research and highlight practical considerations for applying RAG to historical, non-English archives.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a promising approach for knowledge-intensive tasks. However, few studies have examined RAG for Taiwanese Historical Archives. In this paper, we present an initial study of a RAG pipeline applied to two historical Traditional Chinese datasets, Fort Zeelandia and the Taiwan Provincial Council Gazette, along with their corresponding open-ended query sets. We systematically investigate the effects of query characteristics and metadata integration strategies on retrieval quality, answer generation, and the performance of the overall system. The results show that early-stage metadata integration enhances both retrieval and answer accuracy while also revealing persistent challenges for RAG systems, including hallucinations during generation and difficulties in handling temporal or multi-hop historical queries.

Paper Structure

This paper contains 26 sections, 2 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: Overview of RAG pipeline and components in each stage. The two highlighted elements: Query and Metadata are the key factors that impact RAG system performance we focused on in this paper. The details of these factors are discussed in Section 3.1 and Section 3.2, respectively. Section 6.2 and Section 6.3 elaborates how these factors impact retrieval and generation performance.
  • Figure 2: Overview of four retrieval strategies with different metadata integration stages explored in this work. (a) Baseline Retrieval retrieves top passages using only the query and document content. (b) Metadata-Augmented Retrieval integrates metadata into the document representation during retrieval. (c) Metadata-Only Reranking uses only metadata during the reranking stage after initial retrieval. (d) Metadata-Augmented Reranking incorporates both document content and metadata in the reranking stage.
  • Figure 3: Fort Zeelandia Dataset Recall@5 per Question Complexity by Retriever
  • Figure 4: Fort Zeelandia Dataset Recall@5 per Entity Focus by Retriever
  • Figure 5: TPCG retrieval performance across different metadata integration stages and metadata types. Left: Metadata-Augmented Retrieval performance across different metadata types. Center: Performance of Metadata-Only Reranking across different metadata types. Right: Retrieval performance of Metadata-Augmented Reranking across different metadata types.
  • ...and 16 more figures