Transformer-Based Vector Font Classification Using Different Font Formats: TrueType versus PostScript
Takumu Fujioka, Gouhei Tanaka
TL;DR
This work investigates how vector font representations affect Transformer-based classification by comparing TrueType and PostScript outline formats. Using a CLS-token Transformer framework, the authors embed outlines either as point sequences (TrueType) or as command sequences (PostScript) and evaluate on font style and weight tasks, including Kanji shapes. Across experiments, PostScript outlines consistently outperform TrueType outlines, with gains attributed largely to superior information aggregation via command-based embeddings and segmentation. The findings guide outline-format choices in vector-font DL and point toward future improvements via patch-based tokenization to enhance both classification and generation tasks.
Abstract
Modern fonts adopt vector-based formats, which ensure scalability without loss of quality. While many deep learning studies on fonts focus on bitmap formats, deep learning for vector fonts remains underexplored. In studies involving deep learning for vector fonts, the choice of font representation has often been made conventionally. However, the font representation format is one of the factors that can influence the computational performance of machine learning models in font-related tasks. Here we show that font representations based on PostScript outlines outperform those based on TrueType outlines in Transformer-based vector font classification. TrueType outlines represent character shapes as sequences of points and their associated flags, whereas PostScript outlines represent them as sequences of commands. In previous research, PostScript outlines have been predominantly used when fonts are treated as part of vector graphics, while TrueType outlines are mainly employed when focusing on fonts alone. Whether to use PostScript or TrueType outlines has been mainly determined by file format specifications and precedent settings in previous studies, rather than performance considerations. To date, few studies have compared which outline format provides better embedding representations. Our findings suggest that information aggregation is crucial in Transformer-based deep learning for vector graphics, as in tokenization in language models and patch division in bitmap-based image recognition models. This insight provides valuable guidance for selecting outline formats in future research on vector graphics.
