Clustering Running Titles to Understand the Printing of Early Modern Books

Nikolai Vogler; Kartik Goyal; Samuel V. Lemley; D. J. Schuldt; Christopher N. Warren; Max G'Sell; Taylor Berg-Kirkpatrick

Clustering Running Titles to Understand the Printing of Early Modern Books

Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick

TL;DR

The paper addresses how to infer printing workflows in early modern books by clustering running titles to reveal underlying skeleton formes. It introduces two kernel-based approaches, a domain-informed Lev kernel and a neural ViT cross-encoder kernel, integrated into a spectral clustering framework that leverages sheet-side structure and book gatherings. Evaluation on about 1600 running titles from eight books shows the Lev kernel generally outperforms the neural approach and that incorporating gathering information yields robust forme-based groupings, including perfect clusterings on several held-out formats. The work offers a scalable, non-OCR-based tool for bibliographic analysis with potential applications to EEBO-like digital corpora and censorship studies.

Abstract

We propose a novel computational approach to automatically analyze the physical process behind printing of early modern letterpress books via clustering the running titles found at the top of their pages. Specifically, we design and compare custom neural and feature-based kernels for computing pairwise visual similarity of a scanned document's running titles and cluster the titles in order to track any deviations from the expected pattern of a book's printing. Unlike body text which must be reset for every page, the running titles are one of the static type elements in a skeleton forme i.e. the frame used to print each side of a sheet of paper, and were often re-used during a book's printing. To evaluate the effectiveness of our approach, we manually annotate the running title clusters on about 1600 pages across 8 early modern books of varying size and formats. Our method can detect potential deviation from the expected patterns of such skeleton formes, which helps bibliographers understand the phenomena associated with a text's transmission, such as censorship. We also validate our results against a manual bibliographic analysis of a counterfeit early edition of Thomas Hobbes' Leviathan (1651).

Clustering Running Titles to Understand the Printing of Early Modern Books

TL;DR

Abstract

Paper Structure (17 sections, 2 equations, 6 figures, 2 tables)

This paper contains 17 sections, 2 equations, 6 figures, 2 tables.

Introduction
Background: The Making of Early Modern Books
Approach: Spectral Clustering with Custom Kernels for Running Title Variation
Kernels for Analyzing Kerning Variation
Quantized Levenshtein Kernel (Lev):
Neural Cross-Encoder Vision Transformer Kernel (ViT):
Sheet side similarity via reduction:
Spectral Clustering of Sheet Sides
Dataset: A Suite of Skeleton Forme Clustering Tasks
Results
Qualitative Bibliographic Findings
Related Work: Analytical & Computational
Analytical Bibliography:
Computational Methods:
Conclusion
...and 2 more sections

Figures (6)

Figure 1: In this paper, we propose a new task that clusters running titles (colored rectangles) of early modern books into the underlying skeleton formes (outlined in gray dotted line) used to print them. We show that by leveraging additional information about how the book was printed, namely the gathering structure of the book, we can cluster sheet sides instead of pages or recto pages (i.e., right side pages), which greatly improves performance.
Figure 2: "Skeleton forme" and function in early modern printing. On the left, two schematic metal skeleton formes $F_1$ and $F_2$, used in the printing of Leviathan, are shown. In red, we highlight subtle differences in glyph anatomy and spacing between the headlines of $F_1$ and $F_2$. Underneath this, we display real groups of headlines from Leviathan that are printed using the schematic skeleton formes $F_1$ and $F_2$ found by clustering on such spacing. Each group is printed from the same skeleton forme with consistent running titles. At the top right, we show how this book was folded and nested into constituent gatherings, with color-coded headlines corresponding to the actual headlines underneath. We note that body text is not shown because it is reset while printing.
Figure 3: Proposed custom feature-based Levenshtein and neural cross-encoder Vision Transformer-based similarity kernels for comparing running titles of sheet sides in early modern printed books. We compute pairwise similarities between each corresponding running title image on different sheet sides across the entire book, where $s_i$ and $s_j$ denote the two sheet sides being compared, with each of them containing $n$ running titles whose position and number is predetermined by the format of the book. Here, $s_i^{(k)}$ represents the $k$-th positional running title on sheet $s_i$. The kernel functions combine $k$ similarity computations performed between the running titles at the corresponding positions across sheet sides $s_i$ and $s_j$ via reduction operation $\bigodot$ (see Sec. \ref{['sec:kernels']}). Using the similarities, we cluster kernel matrix $A$ to discover sheet sides printed with the same underlying skeleton forme. Running titles from 'Of Darkness' section of Leviathan are shown binarized.
Figure 4: Annotated differences used to manually discover the latent clustering of skeleton formes for King Lear, which was printed in the quarto format (i.e., 4 pages printed per side of a sheet). Features that were used to discriminate between clusters during the annotation process are highlighted and described in red. In this figure, each row represents a sheet side with each of the corresponding extracted headlines appearing in the same columns. Our proposed sheet side clustering model perfectly partitions these skeleton forme clusters, as shown in Table \ref{['tab:main_results']}.
Figure 5: We show ground truth (bottom) and predicted (top) skeleton forme cluster assignments as they appear across the four sections (Of Man, Of Commonwealth, Christian Commonwealth, Of Darkness) of Leviathan from the start of the book on the left to the finish on the right. On the x-axis, we label the gathering IDs as they appear (and repeat themselves) throughout the book. Each blue dot represents a page, but they occur in contiguous pairs of two due to our unfolding of its folio format (two pages per sheet side). For the ground truth, we summarize what the different skeleton forme usage reveals about the making of the book, as initially revealed in warrenetal2021. Finally, we annotate on the top plot where our predicted clusters succeed and fail against the ground truth findings. See Qualitative Findings section for more details.
...and 1 more figures

Clustering Running Titles to Understand the Printing of Early Modern Books

TL;DR

Abstract

Clustering Running Titles to Understand the Printing of Early Modern Books

Authors

TL;DR

Abstract

Table of Contents

Figures (6)