Table of Contents
Fetching ...

ZuantuSet: A Collection of Historical Chinese Visualizations and Illustrations

Xiyao Mei, Yu Zhang, Chaofan Yang, Rui Shi, Xiaoru Yuan

TL;DR

ZuantuSet tackles the underexplored territory of historical Chinese visualizations by introducing a large-scale dataset (over 71K visualizations and 108K illustrations) collected from more than 12,000 books via a semi-automatic pipeline. The pipeline combines data collection from major digital libraries, metadata unification, deduplication, image retrieval, graphic detection with YOLOv8, stitching, and a three-stage taxonomy labeling aided by CLIP-based similarity, enabling scalable annotation of forms and domains. Through a framework adapted from prior visualization research, the authors analyze design patterns in ZuantuSet, revealing strong cultural influences such as textualism and pictorialism, and mapping relationships between forms (maps, genealogies, tables) and domains (geography, history, biology). They also present usage scenarios, including textual criticism and culture-focused design, and discuss design implications for cross-cultural visualization research and reviving historical Chinese graphics. Overall, ZuantuSet provides a foundation for cross-cultural studies of visualization history and offers practical tools for exploring, interpreting, and reusing historical Chinese graphics in education, design, and research.

Abstract

Historical visualizations are a valuable resource for studying the history of visualization and inspecting the cultural context where they were created. When investigating historical visualizations, it is essential to consider contributions from different cultural frameworks to gain a comprehensive understanding. While there is extensive research on historical visualizations within the European cultural framework, this work shifts the focus to ancient China, a cultural context that remains underexplored by visualization researchers. To this aim, we propose a semi-automatic pipeline to collect, extract, and label historical Chinese visualizations. Through the pipeline, we curate ZuantuSet, a dataset with over 71K visualizations and 108K illustrations. We analyze distinctive design patterns of historical Chinese visualizations and their potential causes within the context of Chinese history and culture. We illustrate potential usage scenarios for this dataset, summarize the unique challenges and solutions associated with collecting historical Chinese visualizations, and outline future research directions.

ZuantuSet: A Collection of Historical Chinese Visualizations and Illustrations

TL;DR

ZuantuSet tackles the underexplored territory of historical Chinese visualizations by introducing a large-scale dataset (over 71K visualizations and 108K illustrations) collected from more than 12,000 books via a semi-automatic pipeline. The pipeline combines data collection from major digital libraries, metadata unification, deduplication, image retrieval, graphic detection with YOLOv8, stitching, and a three-stage taxonomy labeling aided by CLIP-based similarity, enabling scalable annotation of forms and domains. Through a framework adapted from prior visualization research, the authors analyze design patterns in ZuantuSet, revealing strong cultural influences such as textualism and pictorialism, and mapping relationships between forms (maps, genealogies, tables) and domains (geography, history, biology). They also present usage scenarios, including textual criticism and culture-focused design, and discuss design implications for cross-cultural visualization research and reviving historical Chinese graphics. Overall, ZuantuSet provides a foundation for cross-cultural studies of visualization history and offers practical tools for exploring, interpreting, and reusing historical Chinese graphics in education, design, and research.

Abstract

Historical visualizations are a valuable resource for studying the history of visualization and inspecting the cultural context where they were created. When investigating historical visualizations, it is essential to consider contributions from different cultural frameworks to gain a comprehensive understanding. While there is extensive research on historical visualizations within the European cultural framework, this work shifts the focus to ancient China, a cultural context that remains underexplored by visualization researchers. To this aim, we propose a semi-automatic pipeline to collect, extract, and label historical Chinese visualizations. Through the pipeline, we curate ZuantuSet, a dataset with over 71K visualizations and 108K illustrations. We analyze distinctive design patterns of historical Chinese visualizations and their potential causes within the context of Chinese history and culture. We illustrate potential usage scenarios for this dataset, summarize the unique challenges and solutions associated with collecting historical Chinese visualizations, and outline future research directions.

Paper Structure

This paper contains 24 sections, 7 figures.

Figures (7)

  • Figure 1: The semi-automatic data curation pipeline for historical Chinese visualizations and illustrations: (A) We collect historical Chinese books from collections of three digital libraries. (B) We unify and deduplicate book metadata from different libraries, then fetch corresponding images. (C) We utilize YOLO to detect graphics from these images. Some incomplete graphics are stitched. (D) The graphics are further classified through three steps. Initial labeling aims to obtain a basic taxonomy. Batching labeling aims to further expand the quantity of labeled graphics. Similarity-based matching assigns labels to unlabeled graphics. (E) The data structures for the metadata of three types of entities involved in ZuantuSet: book, image, and graphic.
  • Figure 2: Two interactive systems of ZuantuSet: (A) ZuantuSet Gallery for the user to browse historical Chinese graphics. (A1) Detail panel of a graphic. (A2) Timeline showing the temporal distribution of the filtered graphics. (B) ZuantuSet Labeler for the user to categorize historical Chinese graphics. (B1) Single mode labeling for the user to edit the bounding box and metadata of a single graphic. (B2) Batch mode labeling for the user to retrieve multiple graphics through metadata query or similarity matching and label them at once.
  • Figure 3: A framework of elements involved in historical visualizations as a visual communication channel: The framework is adapted from frameworks in the literature Zhang2021MappingMunzner2009NestedLasswell1948Structure. We emphasize the impact of historical factors, such as politics, cultures, and religions, on the framework components. We also consider the influence on contemporary perspective when investigating the effect of historical visualizations.
  • Figure 4: The correlation between form and domain of visualizations in ZuantuSet.
  • Figure 5: Examples of map in ZuantuSet: (Left) https://www.digital.archives.go.jp/acv/auto_conversion/conv/jp2jpeg?ID=M2021050616304107841&p=6 from Qi jing tuWu1615Qi, 1615. The layout plan shows a planimetric mode that rectangle In the top-left corner is an ancestral temple dedicated to the spirits of deceased ancestors. The annotated Chinese character may represent a tablet bearing the ancestors' names, which is rotated 90 degrees to indicate that the tablet may face the central point. (Middle) https://ids.lib.harvard.edu/ids/iiif/23518593/full/full/0/default.jpg, from Guang yu tuZhu1566Guang, 1566. The map shows the system built during the Ming dynasty (1368 - 1644) to protect the northern border and the Great Wall. We annotate pictorial elements such as mountains and rivers. Geographic location can be encoded by the scattered texts alone. The position of the annotated text indicates the location of the army and government office of Guizhou. The map is presented with grids whose length corresponds to 500 Li (A traditional Chinese unit of distance). (Right) \UrlOfDongXiFenShanTu, from Tian xia shan he liang jie kaoXu1723Tian, 1723. Paragraphs are directly written on the map, which provides additional historical information about the specific location. We add annotations to the map to indicate some of these paragraphs and point them to the corresponding locations.
  • ...and 2 more figures