Judging a Book By its Cover
Brian Kenji Iwana, Syed Tahseen Raza Rizvi, Sheraz Ahmed, Andreas Dengel, Seiichi Uchida
TL;DR
This work investigates whether book covers encode detectable genre signals that a CNN can learn. By fine-tuning an ImageNet-pretrained AlexNet on a large, expertly labeled book-cover dataset, the study demonstrates that visual cues—colors, objects, text, and layout—enable partial genre classification, with Top-1 accuracy around 24.7% across 30 classes, and higher Top-3 accuracy. The analysis reveals that covers can be misleading and ambiguous, motivating future work in multi-label classification and combining textual features. The paper also identifies concrete design principles that the model leverages, offering practical insights for cover design and automated genre tagging. A publicly available dataset of 137,788 covers supports further pattern-recognition research in this domain.
Abstract
Book covers communicate information to potential readers, but can that same information be learned by computers? We propose using a deep Convolutional Neural Network (CNN) to predict the genre of a book based on the visual clues provided by its cover. The purpose of this research is to investigate whether relationships between books and their covers can be learned. However, determining the genre of a book is a difficult task because covers can be ambiguous and genres can be overarching. Despite this, we show that a CNN can extract features and learn underlying design rules set by the designer to define a genre. Using machine learning, we can bring the large amount of resources available to the book cover design process. In addition, we present a new challenging dataset that can be used for many pattern recognition tasks.
