Table of Contents
Fetching ...

Transformer-Based Vector Font Classification Using Different Font Formats: TrueType versus PostScript

Takumu Fujioka, Gouhei Tanaka

TL;DR

This work investigates how vector font representations affect Transformer-based classification by comparing TrueType and PostScript outline formats. Using a CLS-token Transformer framework, the authors embed outlines either as point sequences (TrueType) or as command sequences (PostScript) and evaluate on font style and weight tasks, including Kanji shapes. Across experiments, PostScript outlines consistently outperform TrueType outlines, with gains attributed largely to superior information aggregation via command-based embeddings and segmentation. The findings guide outline-format choices in vector-font DL and point toward future improvements via patch-based tokenization to enhance both classification and generation tasks.

Abstract

Modern fonts adopt vector-based formats, which ensure scalability without loss of quality. While many deep learning studies on fonts focus on bitmap formats, deep learning for vector fonts remains underexplored. In studies involving deep learning for vector fonts, the choice of font representation has often been made conventionally. However, the font representation format is one of the factors that can influence the computational performance of machine learning models in font-related tasks. Here we show that font representations based on PostScript outlines outperform those based on TrueType outlines in Transformer-based vector font classification. TrueType outlines represent character shapes as sequences of points and their associated flags, whereas PostScript outlines represent them as sequences of commands. In previous research, PostScript outlines have been predominantly used when fonts are treated as part of vector graphics, while TrueType outlines are mainly employed when focusing on fonts alone. Whether to use PostScript or TrueType outlines has been mainly determined by file format specifications and precedent settings in previous studies, rather than performance considerations. To date, few studies have compared which outline format provides better embedding representations. Our findings suggest that information aggregation is crucial in Transformer-based deep learning for vector graphics, as in tokenization in language models and patch division in bitmap-based image recognition models. This insight provides valuable guidance for selecting outline formats in future research on vector graphics.

Transformer-Based Vector Font Classification Using Different Font Formats: TrueType versus PostScript

TL;DR

This work investigates how vector font representations affect Transformer-based classification by comparing TrueType and PostScript outline formats. Using a CLS-token Transformer framework, the authors embed outlines either as point sequences (TrueType) or as command sequences (PostScript) and evaluate on font style and weight tasks, including Kanji shapes. Across experiments, PostScript outlines consistently outperform TrueType outlines, with gains attributed largely to superior information aggregation via command-based embeddings and segmentation. The findings guide outline-format choices in vector-font DL and point toward future improvements via patch-based tokenization to enhance both classification and generation tasks.

Abstract

Modern fonts adopt vector-based formats, which ensure scalability without loss of quality. While many deep learning studies on fonts focus on bitmap formats, deep learning for vector fonts remains underexplored. In studies involving deep learning for vector fonts, the choice of font representation has often been made conventionally. However, the font representation format is one of the factors that can influence the computational performance of machine learning models in font-related tasks. Here we show that font representations based on PostScript outlines outperform those based on TrueType outlines in Transformer-based vector font classification. TrueType outlines represent character shapes as sequences of points and their associated flags, whereas PostScript outlines represent them as sequences of commands. In previous research, PostScript outlines have been predominantly used when fonts are treated as part of vector graphics, while TrueType outlines are mainly employed when focusing on fonts alone. Whether to use PostScript or TrueType outlines has been mainly determined by file format specifications and precedent settings in previous studies, rather than performance considerations. To date, few studies have compared which outline format provides better embedding representations. Our findings suggest that information aggregation is crucial in Transformer-based deep learning for vector graphics, as in tokenization in language models and patch division in bitmap-based image recognition models. This insight provides valuable guidance for selecting outline formats in future research on vector graphics.

Paper Structure

This paper contains 14 sections, 8 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: TrueType outline representation. Each point is assigned an index to indicate which contour it belongs to and where it is located within that contour, and it is specified by its location$(x, y)$. The on/off flag determines whether a point is an on-curve point (True) or an off-curve control point (False). Off-curve points act as control points for quadratic Bézier curves, shaping the outline's curvature.
  • Figure 2: PostScript outline representation. Each drawing step is identified by an index. The associated command specifies an operation such as moveTo, lineTo, curveTo, or closePath, with parameters defining coordinates, including control points for Bézier curves and end-points.
  • Figure 3: Process of converting a TrueType outline into a PostScript outline. This transformation involves multiple steps, including decomposing quadratic Bézier splines, restructuring point sequences into segments, and converting quadratic Bézier curves into cubic Bézier curves. Each stage is illustrated in the figure, showing how the outline evolves through the transformation.
  • Figure 4: Model architecture for font classification. The input sequence, consisting of vector font outline data, is first mapped to embedding vectors. A classification token (CLS) is prepended to the sequence before being processed by the Transformer Encoder. The encoder consists of multiple layers of multi-head self-attention, feed-forward networks, and layer normalization. The output corresponding to the CLS token is passed to a classifier to predict the font category.
  • Figure 5: Confusion matrices for font style classifications with (a) original TrueType outline and (b) PostScript outline. It can be observed that the PostScript outline performs slightly better than the original TrueType outline.
  • ...and 3 more figures