Historical Printed Ornaments: Dataset and Tasks
Sayan Kumar Chaki, Zeynep Sonat Baltaci, Elliot Vincent, Remi Emonet, Fabienne Vial-Bonacci, Christelle Bahier-Porte, Mathieu Aubry, Thierry Fournel
TL;DR
The paper tackles analyzing historical printed ornaments with unsupervised computer vision, introducing the Rey's Ornaments dataset built from Marc-Michel Rey's books to study clustering, element discovery, and unsupervised change localization. It provides benchmarks and evaluates state-of-the-art methods across three tasks, including a synthetic pretraining pipeline for element discovery. Key findings show that simple baselines like k-means on pixel space can outperform sophisticated models on this data, while many unsupervised methods struggle with real historical variability and tight annotations. The work highlights the need for task-specific definitions of change and robust modeling of background and ink variations, offering a valuable dataset and codebase to spur further research in document and historical object analysis.
Abstract
This paper aims to develop the study of historical printed ornaments with modern unsupervised computer vision. We highlight three complex tasks that are of critical interest to book historians: clustering, element discovery, and unsupervised change localization. For each of these tasks, we introduce an evaluation benchmark, and we adapt and evaluate state-of-the-art models. Our Rey's Ornaments dataset is designed to be a representative example of a set of ornaments historians would be interested in. It focuses on an XVIIIth century bookseller, Marc-Michel Rey, providing a consistent set of ornaments with a wide diversity and representative challenges. Our results highlight the limitations of state-of-the-art models when faced with real data and show simple baselines such as k-means or congealing can outperform more sophisticated approaches on such data. Our dataset and code can be found at https://printed-ornaments.github.io/.
