Table of Contents
Fetching ...

Judging a Book By its Cover

Brian Kenji Iwana, Syed Tahseen Raza Rizvi, Sheraz Ahmed, Andreas Dengel, Seiichi Uchida

TL;DR

This work investigates whether book covers encode detectable genre signals that a CNN can learn. By fine-tuning an ImageNet-pretrained AlexNet on a large, expertly labeled book-cover dataset, the study demonstrates that visual cues—colors, objects, text, and layout—enable partial genre classification, with Top-1 accuracy around 24.7% across 30 classes, and higher Top-3 accuracy. The analysis reveals that covers can be misleading and ambiguous, motivating future work in multi-label classification and combining textual features. The paper also identifies concrete design principles that the model leverages, offering practical insights for cover design and automated genre tagging. A publicly available dataset of 137,788 covers supports further pattern-recognition research in this domain.

Abstract

Book covers communicate information to potential readers, but can that same information be learned by computers? We propose using a deep Convolutional Neural Network (CNN) to predict the genre of a book based on the visual clues provided by its cover. The purpose of this research is to investigate whether relationships between books and their covers can be learned. However, determining the genre of a book is a difficult task because covers can be ambiguous and genres can be overarching. Despite this, we show that a CNN can extract features and learn underlying design rules set by the designer to define a genre. Using machine learning, we can bring the large amount of resources available to the book cover design process. In addition, we present a new challenging dataset that can be used for many pattern recognition tasks.

Judging a Book By its Cover

TL;DR

This work investigates whether book covers encode detectable genre signals that a CNN can learn. By fine-tuning an ImageNet-pretrained AlexNet on a large, expertly labeled book-cover dataset, the study demonstrates that visual cues—colors, objects, text, and layout—enable partial genre classification, with Top-1 accuracy around 24.7% across 30 classes, and higher Top-3 accuracy. The analysis reveals that covers can be misleading and ambiguous, motivating future work in multi-label classification and combining textual features. The paper also identifies concrete design principles that the model leverages, offering practical insights for cover design and automated genre tagging. A publicly available dataset of 137,788 covers supports further pattern-recognition research in this domain.

Abstract

Book covers communicate information to potential readers, but can that same information be learned by computers? We propose using a deep Convolutional Neural Network (CNN) to predict the genre of a book based on the visual clues provided by its cover. The purpose of this research is to investigate whether relationships between books and their covers can be learned. However, determining the genre of a book is a difficult task because covers can be ambiguous and genres can be overarching. Despite this, we show that a CNN can extract features and learn underlying design rules set by the designer to define a genre. Using machine learning, we can bring the large amount of resources available to the book cover design process. In addition, we present a new challenging dataset that can be used for many pattern recognition tasks.

Paper Structure

This paper contains 14 sections, 1 equation, 9 figures, 1 table.

Figures (9)

  • Figure 1: Sample test set images from the "Cookbooks, Food & Wine" category. The top row shows the cover images and the bottom row shows their respective softmax activations from AlexNet. The blue bar is the correct class and the red bars are the other classes. Only the top 5 highest activations are displayed. (a) is examples of correctly classified books and (b) is examples of books belonging to "Cookbooks, Food & Wine" that were misclassified as other classes.
  • Figure 2: The "Biographies & Memoirs" book covers that were classified by AlexNet as "History." While misclassified, many of these books also can relate to "History" despite the ground truth.
  • Figure 3: Visualization of the output layer softmax activations of AlexNet. Each point is a 30-dimensional vector where each dimension is the probability of each output class. For visualization purposes, the points are mapped into 2-dimensional subspace with PCA. The arrows represent the axes of each class. The class ground truth is represented by colors, chosen at random. Sample images with high activations from each class are enlarged.
  • Figure 4: Book covers from genres with particular color associations. Each example was correctly classified by the AlexNet.
  • Figure 5: Book covers that were successfully classified by the common moods or color pallets of respective genres.
  • ...and 4 more figures