Table of Contents
Fetching ...

Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model

Felipe Mahlow, André Felipe Zanella, William Alberto Cruz Castañeda, Regilene Aparecida Sarzi-Ribeiro

TL;DR

The paper investigates the feasibility of using a two-stage text-to-image diffusion pipeline, Stable Diffusion XL Base 1.0 followed by Refiner 1.0, to illustrate seven public-domain Brazilian books. It combines a careful prompt design with large-scale generation (1500 images per book) and evaluates results using CLIP Score and Inception Score to assess semantic alignment and image quality/diversity. The study finds that prompt specificity significantly impacts visual outcomes and reveals biases toward white representations, with performance varying across works (e.g., Horto achieving the highest CLIP and Policarpo Quaresma the highest IS). The work highlights both the potential and limitations of AI-generated literary illustrations and suggests directions for future prompt engineering and dataset diversification to improve fidelity and inclusivity.

Abstract

In recent years, Generative Artificial Intelligence (GenAI) has undergone a profound transformation in addressing intricate tasks involving diverse modalities such as textual, auditory, visual, and pictorial generation. Within this spectrum, text-to-image (TTI) models have emerged as a formidable approach to generating varied and aesthetically appealing compositions, spanning applications from artistic creation to realistic facial synthesis, and demonstrating significant advancements in computer vision, image processing, and multimodal tasks. The advent of Latent Diffusion Models (LDMs) signifies a paradigm shift in the domain of AI capabilities. This article delves into the feasibility of employing the Stable Diffusion LDM to illustrate literary works. For this exploration, seven classic Brazilian books have been selected as case studies. The objective is to ascertain the practicality of this endeavor and to evaluate the potential of Stable Diffusion in producing illustrations that augment and enrich the reader's experience. We will outline the beneficial aspects, such as the capacity to generate distinctive and contextually pertinent images, as well as the drawbacks, including any shortcomings in faithfully capturing the essence of intricate literary depictions. Through this study, we aim to provide a comprehensive assessment of the viability and efficacy of utilizing AI-generated illustrations in literary contexts, elucidating both the prospects and challenges encountered in this pioneering application of technology.

Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model

TL;DR

The paper investigates the feasibility of using a two-stage text-to-image diffusion pipeline, Stable Diffusion XL Base 1.0 followed by Refiner 1.0, to illustrate seven public-domain Brazilian books. It combines a careful prompt design with large-scale generation (1500 images per book) and evaluates results using CLIP Score and Inception Score to assess semantic alignment and image quality/diversity. The study finds that prompt specificity significantly impacts visual outcomes and reveals biases toward white representations, with performance varying across works (e.g., Horto achieving the highest CLIP and Policarpo Quaresma the highest IS). The work highlights both the potential and limitations of AI-generated literary illustrations and suggests directions for future prompt engineering and dataset diversification to improve fidelity and inclusivity.

Abstract

In recent years, Generative Artificial Intelligence (GenAI) has undergone a profound transformation in addressing intricate tasks involving diverse modalities such as textual, auditory, visual, and pictorial generation. Within this spectrum, text-to-image (TTI) models have emerged as a formidable approach to generating varied and aesthetically appealing compositions, spanning applications from artistic creation to realistic facial synthesis, and demonstrating significant advancements in computer vision, image processing, and multimodal tasks. The advent of Latent Diffusion Models (LDMs) signifies a paradigm shift in the domain of AI capabilities. This article delves into the feasibility of employing the Stable Diffusion LDM to illustrate literary works. For this exploration, seven classic Brazilian books have been selected as case studies. The objective is to ascertain the practicality of this endeavor and to evaluate the potential of Stable Diffusion in producing illustrations that augment and enrich the reader's experience. We will outline the beneficial aspects, such as the capacity to generate distinctive and contextually pertinent images, as well as the drawbacks, including any shortcomings in faithfully capturing the essence of intricate literary depictions. Through this study, we aim to provide a comprehensive assessment of the viability and efficacy of utilizing AI-generated illustrations in literary contexts, elucidating both the prospects and challenges encountered in this pioneering application of technology.
Paper Structure (14 sections, 2 figures, 1 table)

This paper contains 14 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Examples of generated images for (a, b) Dom Casmurro, (c) A Viúva Simões, and (d, e) Senhora.
  • Figure 2: Examples of generated images for (a) Os Sertões, (b) Horto, (c) A Viúva Simões, (d) O cortiço, and (e) O Triste Fim de Policarpo Quaresma.