Table of Contents
Fetching ...

Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning

Claire Roman, Philippe Meyer

TL;DR

This work trains an encoder with contrastive loss on labeled invented alphabets, establishing a teacher model with robust discriminative features and extends to historically attested scripts through teacher-student distillation, where the student learns unsupervised representations guided by the teacher's knowledge but free to discover latent cross-script similarities.

Abstract

Learning similarity metrics for glyphs and writing systems faces a fundamental challenge: while individual graphemes within invented alphabets can be reliably labeled, the historical relationships between different scripts remain uncertain and contested. We propose a two-stage framework that addresses this epistemological constraint. First, we train an encoder with contrastive loss on labeled invented alphabets, establishing a teacher model with robust discriminative features. Second, we extend to historically attested scripts through teacher-student distillation, where the student learns unsupervised representations guided by the teacher's knowledge but free to discover latent cross-script similarities. The asymmetric setup enables the student to learn deformation-invariant embeddings while inheriting discriminative structure from clean examples. Our approach bridges supervised contrastive learning and unsupervised discovery, enabling both hard boundaries between distinct systems and soft similarities reflecting potential historical influences. Experiments on diverse writing systems demonstrate effective few-shot glyph recognition and meaningful script clustering without requiring ground-truth evolutionary relationships.

Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning

TL;DR

This work trains an encoder with contrastive loss on labeled invented alphabets, establishing a teacher model with robust discriminative features and extends to historically attested scripts through teacher-student distillation, where the student learns unsupervised representations guided by the teacher's knowledge but free to discover latent cross-script similarities.

Abstract

Learning similarity metrics for glyphs and writing systems faces a fundamental challenge: while individual graphemes within invented alphabets can be reliably labeled, the historical relationships between different scripts remain uncertain and contested. We propose a two-stage framework that addresses this epistemological constraint. First, we train an encoder with contrastive loss on labeled invented alphabets, establishing a teacher model with robust discriminative features. Second, we extend to historically attested scripts through teacher-student distillation, where the student learns unsupervised representations guided by the teacher's knowledge but free to discover latent cross-script similarities. The asymmetric setup enables the student to learn deformation-invariant embeddings while inheriting discriminative structure from clean examples. Our approach bridges supervised contrastive learning and unsupervised discovery, enabling both hard boundaries between distinct systems and soft similarities reflecting potential historical influences. Experiments on diverse writing systems demonstrate effective few-shot glyph recognition and meaningful script clustering without requiring ground-truth evolutionary relationships.
Paper Structure (34 sections, 10 equations, 3 figures, 1 table)

This paper contains 34 sections, 10 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Two-stage framework. Stage 1: train a teacher encoder $f_{\phi}^*$ with supervised contrastive learning (SupCon) on labeled invented scripts, yielding a discriminative embedding space. Stage 2: adapt to unlabeled historical scripts via teacher-initialized BYOL: a student $f_{\theta}$ with predictor $q_{\theta}$ matches a momentum (EMA) target network $f_{\xi}$ on two augmented views, using stop-gradient, and without cross-script negatives.
  • Figure 2: Datasets used in this work. Omniglot is split into supervised (invented alphabets for Stage 1), unsupervised (historical scripts for Stage 2), and evaluation sets. A complementary Unicode character dataset is generated using Noto fonts. Augmented instances per glyph are obtained through random transformations (red boxes).
  • Figure 3: Two-dimensional t-SNE projections van2008visualizing of glyph embeddings for the CJK, Greek and Latin scripts, shown on the left using the teacher model $f_{\phi}^*$ and on the right the student model $f_{\theta}$. Points correspond to glyph instances; colors/markers indicate script labels. The reported $R$ denotes the separability ratio \ref{['eq:sep_ratio']}, illustrating how improved representation geometry yields more coherent cross-script structure.