ZuantuSet: A Collection of Historical Chinese Visualizations and Illustrations
Xiyao Mei, Yu Zhang, Chaofan Yang, Rui Shi, Xiaoru Yuan
TL;DR
ZuantuSet tackles the underexplored territory of historical Chinese visualizations by introducing a large-scale dataset (over 71K visualizations and 108K illustrations) collected from more than 12,000 books via a semi-automatic pipeline. The pipeline combines data collection from major digital libraries, metadata unification, deduplication, image retrieval, graphic detection with YOLOv8, stitching, and a three-stage taxonomy labeling aided by CLIP-based similarity, enabling scalable annotation of forms and domains. Through a framework adapted from prior visualization research, the authors analyze design patterns in ZuantuSet, revealing strong cultural influences such as textualism and pictorialism, and mapping relationships between forms (maps, genealogies, tables) and domains (geography, history, biology). They also present usage scenarios, including textual criticism and culture-focused design, and discuss design implications for cross-cultural visualization research and reviving historical Chinese graphics. Overall, ZuantuSet provides a foundation for cross-cultural studies of visualization history and offers practical tools for exploring, interpreting, and reusing historical Chinese graphics in education, design, and research.
Abstract
Historical visualizations are a valuable resource for studying the history of visualization and inspecting the cultural context where they were created. When investigating historical visualizations, it is essential to consider contributions from different cultural frameworks to gain a comprehensive understanding. While there is extensive research on historical visualizations within the European cultural framework, this work shifts the focus to ancient China, a cultural context that remains underexplored by visualization researchers. To this aim, we propose a semi-automatic pipeline to collect, extract, and label historical Chinese visualizations. Through the pipeline, we curate ZuantuSet, a dataset with over 71K visualizations and 108K illustrations. We analyze distinctive design patterns of historical Chinese visualizations and their potential causes within the context of Chinese history and culture. We illustrate potential usage scenarios for this dataset, summarize the unique challenges and solutions associated with collecting historical Chinese visualizations, and outline future research directions.
