Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model
Jing Li, Qiu-Feng Wang, Siyuan Wang, Rui Zhang, Kaizhu Huang, Erik Cambria
TL;DR
Diff-Oracle introduces a diffusion-based framework for controllable oracle bone script generation and recognition. It jointly learns a style encoder that maps style images to CLIP-compatible embeddings and a content encoder trained via pixel-level paired data produced by CUT, enabling precise control over both style and glyph content. A two-stage training strategy disentangles style and content, and PAIR-like multi-modal guidance with independent content/style scales enables diverse, high-fidelity generation. Empirically, Diff-Oracle achieves state-of-the-art generation metrics and large recognition gains, including 84.62% zero-shot accuracy on OBC306, proving its potential as a practical tool for decipherment and archaeological analysis.
Abstract
Deciphering oracle bone scripts plays an important role in Chinese archaeology and philology. However, a significant challenge remains due to the scarcity of oracle character images. To overcome this issue, we propose Diff-Oracle, a novel approach based on diffusion models to generate a diverse range of controllable oracle characters. Unlike traditional diffusion models that operate primarily on text prompts, Diff-Oracle incorporates a style encoder that utilizes style reference images to control the generation style. This encoder extracts style prompts from existing oracle character images, where style details are converted into a text embedding format via a pretrained language-vision model. On the other hand, a content encoder is integrated within Diff-Oracle to capture specific content details from content reference images, ensuring that the generated characters accurately represent the intended glyphs. To effectively train Diff-Oracle, we pre-generate pixel-level paired oracle character images (i.e., style and content images) by an image-to-image translation model. Extensive qualitative and quantitative experiments are conducted on datasets Oracle-241 and OBC306. While significantly surpassing present generative methods in terms of image generation, Diff-Oracle substantially benefits downstream oracle character recognition, outperforming all existing SOTAs by a large margin. In particular, on the challenging OBC306 dataset, Diff-Oracle leads to an accuracy gain of 7.70% in the zero-shot setting and is able to recognize unseen oracle character images with the accuracy of 84.62%, achieving a new benchmark for deciphering oracle bone scripts.
