Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu
TL;DR
This work introduces Puzzle Pieces Picker (P^3), a Transformer-based framework that deciphers ancient Chinese characters by deconstructing them into radicals and reconstructing modern forms guided by Ideographic Description Sequences (IDS). It also presents ACCP, a large-scale dataset spanning seven historical periods with ~90k categories and ~340k images annotated by radical sequences, enabling cross-era learning. Through radical decomposition via contour and SAM segmentation, MoCo-based labeling, and cross-era reconstruction, P^3 demonstrates promising decipherment performance across seven historical periods, with notable gains when incorporating data from Bronze through Kangxi scripts. The approach shows that cross-era data fusion can significantly improve decipherment of deeply historical OBIs and suggests applicability to other radical-based writing systems. Overall, the work advances paleography by merging radical-level analysis with modern sequence modeling and provides a valuable resource for future research in ancient script decipherment.
Abstract
Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters through radical reconstruction. We deconstruct OBI into foundational strokes and radicals, then employ a Transformer model to reconstruct them into their modern (conterpart)\textcolor{blue}{counterparts}, offering a groundbreaking solution to ancient script analysis. To further this endeavor, a new Ancient Chinese Character Puzzles (ACCP) dataset was developed, comprising an extensive collection of character images from seven key historical stages, annotated with detailed radical sequences. The experiments have showcased considerable promising insights, underscoring the potential and effectiveness of our approach in deciphering the intricacies of ancient Chinese scripts. Through this novel dataset and methodology, we aim to bridge the gap between traditional philology and modern document analysis techniques, offering new insights into the rich history of Chinese linguistic heritage.
