Table of Contents
Fetching ...

Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu

TL;DR

This work introduces Puzzle Pieces Picker (P^3), a Transformer-based framework that deciphers ancient Chinese characters by deconstructing them into radicals and reconstructing modern forms guided by Ideographic Description Sequences (IDS). It also presents ACCP, a large-scale dataset spanning seven historical periods with ~90k categories and ~340k images annotated by radical sequences, enabling cross-era learning. Through radical decomposition via contour and SAM segmentation, MoCo-based labeling, and cross-era reconstruction, P^3 demonstrates promising decipherment performance across seven historical periods, with notable gains when incorporating data from Bronze through Kangxi scripts. The approach shows that cross-era data fusion can significantly improve decipherment of deeply historical OBIs and suggests applicability to other radical-based writing systems. Overall, the work advances paleography by merging radical-level analysis with modern sequence modeling and provides a valuable resource for future research in ancient script decipherment.

Abstract

Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters through radical reconstruction. We deconstruct OBI into foundational strokes and radicals, then employ a Transformer model to reconstruct them into their modern (conterpart)\textcolor{blue}{counterparts}, offering a groundbreaking solution to ancient script analysis. To further this endeavor, a new Ancient Chinese Character Puzzles (ACCP) dataset was developed, comprising an extensive collection of character images from seven key historical stages, annotated with detailed radical sequences. The experiments have showcased considerable promising insights, underscoring the potential and effectiveness of our approach in deciphering the intricacies of ancient Chinese scripts. Through this novel dataset and methodology, we aim to bridge the gap between traditional philology and modern document analysis techniques, offering new insights into the rich history of Chinese linguistic heritage.

Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

TL;DR

This work introduces Puzzle Pieces Picker (P^3), a Transformer-based framework that deciphers ancient Chinese characters by deconstructing them into radicals and reconstructing modern forms guided by Ideographic Description Sequences (IDS). It also presents ACCP, a large-scale dataset spanning seven historical periods with ~90k categories and ~340k images annotated by radical sequences, enabling cross-era learning. Through radical decomposition via contour and SAM segmentation, MoCo-based labeling, and cross-era reconstruction, P^3 demonstrates promising decipherment performance across seven historical periods, with notable gains when incorporating data from Bronze through Kangxi scripts. The approach shows that cross-era data fusion can significantly improve decipherment of deeply historical OBIs and suggests applicability to other radical-based writing systems. Overall, the work advances paleography by merging radical-level analysis with modern sequence modeling and provides a valuable resource for future research in ancient script decipherment.

Abstract

Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P), to decipher these enigmatic characters through radical reconstruction. We deconstruct OBI into foundational strokes and radicals, then employ a Transformer model to reconstruct them into their modern (conterpart)\textcolor{blue}{counterparts}, offering a groundbreaking solution to ancient script analysis. To further this endeavor, a new Ancient Chinese Character Puzzles (ACCP) dataset was developed, comprising an extensive collection of character images from seven key historical stages, annotated with detailed radical sequences. The experiments have showcased considerable promising insights, underscoring the potential and effectiveness of our approach in deciphering the intricacies of ancient Chinese scripts. Through this novel dataset and methodology, we aim to bridge the gap between traditional philology and modern document analysis techniques, offering new insights into the rich history of Chinese linguistic heritage.
Paper Structure (14 sections, 2 equations, 10 figures, 4 tables)

This paper contains 14 sections, 2 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: The consistency and evolution of the roof-like component "宀" in Chinese characters. The first row demonstrates the uniformity in the representation of the radical among various characters during the Oracle Bone Inscriptions period. The second row depicts the evolutionary path of the character "安", highlighting the changes in the roof-like component from 1700 BC through subsequent historical stages to the standardized form in contemporary Chinese script.
  • Figure 2: The decipherment of ancient Chinese characters is treated as a puzzle-solving game, where characters from various periods are first broken down into pieces according to radical strokes. These pieces are then analyzed by the proposed P$^3$ that examines potential evolutionary patterns. In the inference stage, the model predicts a reconstruction recipe when presented with new, unseen samples, thus aiding in decipherment.
  • Figure 3: Examples of the evolution of Chinese characters across seven historical periods in our ACCP dataset. Each row showcases a category of characters, while each column corresponds to a specific period, illustrating the developmental trajectory within the same character category.
  • Figure 4: Flowchart of radical deconstruction.
  • Figure 5: Contour-based and SAM-based methods yield different sets of masks for radicals. These segmented components are tested for their ability to reconstruct the specified modern Chinese characters. Successful reconstructions are retained as final annotations, while failed attempts indicate which segmentation masks to discard.
  • ...and 5 more figures