Table of Contents
Fetching ...

Learning to Kern: Set-wise Estimation of Optimal Letter Space

Kei Nakatsuru, Seiichi Uchida

TL;DR

The paper addresses automatic kerning for Latin letters, where $52^2=2704$ spaces must be determined per font. It introduces a pairwise DNN and a set-wise Transformer to estimate all letter-pair spaces, with the set-wise approach leveraging self-attention to enforce global consistency. On a dataset of about 2558 Google Fonts, the set-wise model achieves an average MAE of roughly $5.3$ pixels when the mean space is around $115$ pixels, outperforming FontForge and the pairwise baseline in most cases. This work demonstrates a practical, holistic approach to kerning that can accelerate font design and suggests avenues for designer-guided adjustments and vector-font integration.

Abstract

Kerning is the task of setting appropriate horizontal spaces for all possible letter pairs of a certain font. One of the difficulties of kerning is that the appropriate space differs for each letter pair. Therefore, for a total of 52 capital and small letters, we need to adjust $52 \times 52 = 2704$ different spaces. Another difficulty is that there is neither a general procedure nor criterion for automatic kerning; therefore, kerning is still done manually or with heuristics. In this paper, we tackle kerning by proposing two machine-learning models, called pairwise and set-wise models. The former is a simple deep neural network that estimates the letter space for two given letter images. In contrast, the latter is a transformer-based model that estimates the letter spaces for three or more given letter images. For example, the set-wise model simultaneously estimates 2704 spaces for 52 letter images for a certain font. Among the two models, the set-wise model is not only more efficient but also more accurate because its internal self-attention mechanism allows for more consistent kerning for all letters. Experimental results on about 2500 Google fonts and their quantitative and qualitative analyses show that the set-wise model has an average estimation error of only about 5.3 pixels when the average letter space of all fonts and letter pairs is about 115 pixels.

Learning to Kern: Set-wise Estimation of Optimal Letter Space

TL;DR

The paper addresses automatic kerning for Latin letters, where spaces must be determined per font. It introduces a pairwise DNN and a set-wise Transformer to estimate all letter-pair spaces, with the set-wise approach leveraging self-attention to enforce global consistency. On a dataset of about 2558 Google Fonts, the set-wise model achieves an average MAE of roughly pixels when the mean space is around pixels, outperforming FontForge and the pairwise baseline in most cases. This work demonstrates a practical, holistic approach to kerning that can accelerate font design and suggests avenues for designer-guided adjustments and vector-font integration.

Abstract

Kerning is the task of setting appropriate horizontal spaces for all possible letter pairs of a certain font. One of the difficulties of kerning is that the appropriate space differs for each letter pair. Therefore, for a total of 52 capital and small letters, we need to adjust different spaces. Another difficulty is that there is neither a general procedure nor criterion for automatic kerning; therefore, kerning is still done manually or with heuristics. In this paper, we tackle kerning by proposing two machine-learning models, called pairwise and set-wise models. The former is a simple deep neural network that estimates the letter space for two given letter images. In contrast, the latter is a transformer-based model that estimates the letter spaces for three or more given letter images. For example, the set-wise model simultaneously estimates 2704 spaces for 52 letter images for a certain font. Among the two models, the set-wise model is not only more efficient but also more accurate because its internal self-attention mechanism allows for more consistent kerning for all letters. Experimental results on about 2500 Google fonts and their quantitative and qualitative analyses show that the set-wise model has an average estimation error of only about 5.3 pixels when the average letter space of all fonts and letter pairs is about 115 pixels.
Paper Structure (15 sections, 11 figures, 1 table)

This paper contains 15 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: (a) Effect of kerning. (b) Letter spaces vary with font styles and letter pairs. This paper measures the letter space by the horizontal distance between the centers of adjacent letters (marked by red dots).
  • Figure 2: Overview of the proposed models for automatic letter-spacing: (a) The pairwise model and (b) the set-wise model.
  • Figure 3: Letter image examples. For example, the letter image ' A' has a width of 172 pixels, with left and right margins of 41 and 43 pixels, respectively. The red vertical line indicates the horizontal center of gravity of each letter.
  • Figure 4: Mean (left) and variance (right) of the ground-truth spaces of all letter pairs. The vertical and horizontal axes correspond to ' A,'$\cdots$' Z,'' a,'$\cdots$' z.'
  • Figure 5: Letter pairs with different space estimation error values. (AE = Absolute error in pixels.) In each pair, the black letters are spaced with the ground-truth. The green and red letters are under-spaced and over-spaced by the specified AE value.
  • ...and 6 more figures