Table of Contents
Fetching ...

NAIST Academic Travelogue Dataset

Hiroki Ouchi, Hiroyuki Shindo, Shoko Wakamiya, Yuki Matsuda, Naoya Inoue, Shohei Higashiyama, Satoshi Nakamura, Taro Watanabe

TL;DR

The paper presents the NAIST Academic Travelogue Dataset (ATD), a large-scale, freely available Japanese text resource designed to study human–place dynamics through travelogues and optional travel schedules. It documents the dataset construction from Arukikata (2007–2022), yielding over 31 million words across 14,279 travelogues (4,672 domestic, 9,607 overseas) and describes automated processing with GiNZA for segmentation and named-entity recognition, including POIs. The authors provide descriptive statistics on text length, POI density, and geographic coverage (all prefectures domestically; 150+ overseas destinations), and discuss the potential for reproducible analyses and cross-study benchmarking. They outline future directions for linguistic annotations and geographic linking to map coordinates, enabling applications in movement analysis, destination trend discovery, hidden-spot identification, and travel planning.

Abstract

We have constructed NAIST Academic Travelogue Dataset (ATD) and released it free of charge for academic research. This dataset is a Japanese text dataset with a total of over 31 million words, comprising 4,672 Japanese domestic travelogues and 9,607 overseas travelogues. Before providing our dataset, there was a scarcity of widely available travelogue data for research purposes, and each researcher had to prepare their own data. This hinders the replication of existing studies and fair comparative analysis of experimental results. Our dataset enables any researchers to conduct investigation on the same data and to ensure transparency and reproducibility in research. In this paper, we describe the academic significance, characteristics, and prospects of our dataset.

NAIST Academic Travelogue Dataset

TL;DR

The paper presents the NAIST Academic Travelogue Dataset (ATD), a large-scale, freely available Japanese text resource designed to study human–place dynamics through travelogues and optional travel schedules. It documents the dataset construction from Arukikata (2007–2022), yielding over 31 million words across 14,279 travelogues (4,672 domestic, 9,607 overseas) and describes automated processing with GiNZA for segmentation and named-entity recognition, including POIs. The authors provide descriptive statistics on text length, POI density, and geographic coverage (all prefectures domestically; 150+ overseas destinations), and discuss the potential for reproducible analyses and cross-study benchmarking. They outline future directions for linguistic annotations and geographic linking to map coordinates, enabling applications in movement analysis, destination trend discovery, hidden-spot identification, and travel planning.

Abstract

We have constructed NAIST Academic Travelogue Dataset (ATD) and released it free of charge for academic research. This dataset is a Japanese text dataset with a total of over 31 million words, comprising 4,672 Japanese domestic travelogues and 9,607 overseas travelogues. Before providing our dataset, there was a scarcity of widely available travelogue data for research purposes, and each researcher had to prepare their own data. This hinders the replication of existing studies and fair comparative analysis of experimental results. Our dataset enables any researchers to conduct investigation on the same data and to ensure transparency and reproducibility in research. In this paper, we describe the academic significance, characteristics, and prospects of our dataset.
Paper Structure (12 sections, 3 figures, 4 tables)

This paper contains 12 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Example of a travelogue.
  • Figure 2: Example of a travel schedule.
  • Figure 3: Distribution of the number of domestic travelogues mentioning each prefecture.