Table of Contents
Fetching ...

Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

Maria-Teresa De Rosa Palmini, Eva Cetinic

TL;DR

This work addresses how text-to-image diffusion systems depict historical contexts, a dimension previously underexplored and prone to distortions. It introduces HistVis, a benchmark consisting of 30,000 synthetic images generated from three diffusion systems across 100 prompts spanning 20 activities and 10 historical periods, with a reproducible evaluation protocol. The study analyzes three dimensions—Implicit Stylistic Associations, Historical Consistency, and Demographic Representation—revealing systematic biases: strong period-specific visual defaults, frequent anachronisms, and demographic patterns that diverge from historically plausible baselines. By providing a robust, open benchmark and comprehensive analyses, the paper offers a foundation for improving historical fidelity and bias mitigation in diffusion-based visual generation, with implications for education, cultural heritage, and public understanding of the past.

Abstract

As Text-to-Image (TTI) diffusion models become increasingly influential in content creation, growing attention is being directed toward their societal and cultural implications. While prior research has primarily examined demographic and cultural biases, the ability of these models to accurately represent historical contexts remains largely underexplored. To address this gap, we introduce a benchmark for evaluating how TTI models depict historical contexts. The benchmark combines HistVis, a dataset of 30,000 synthetic images generated by three state-of-the-art diffusion models from carefully designed prompts covering universal human activities across multiple historical periods, with a reproducible evaluation protocol. We evaluate generated imagery across three key aspects: (1) Implicit Stylistic Associations: examining default visual styles associated with specific eras; (2) Historical Consistency: identifying anachronisms such as modern artifacts in pre-modern contexts; and (3) Demographic Representation: comparing generated racial and gender distributions against historically plausible baselines. Our findings reveal systematic inaccuracies in historically themed generated imagery, as TTI models frequently stereotype past eras by incorporating unstated stylistic cues, introduce anachronisms, and fail to reflect plausible demographic patterns. By providing a reproducible benchmark for historical representation in generated imagery, this work provides an initial step toward building more historically accurate TTI models.

Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

TL;DR

This work addresses how text-to-image diffusion systems depict historical contexts, a dimension previously underexplored and prone to distortions. It introduces HistVis, a benchmark consisting of 30,000 synthetic images generated from three diffusion systems across 100 prompts spanning 20 activities and 10 historical periods, with a reproducible evaluation protocol. The study analyzes three dimensions—Implicit Stylistic Associations, Historical Consistency, and Demographic Representation—revealing systematic biases: strong period-specific visual defaults, frequent anachronisms, and demographic patterns that diverge from historically plausible baselines. By providing a robust, open benchmark and comprehensive analyses, the paper offers a foundation for improving historical fidelity and bias mitigation in diffusion-based visual generation, with implications for education, cultural heritage, and public understanding of the past.

Abstract

As Text-to-Image (TTI) diffusion models become increasingly influential in content creation, growing attention is being directed toward their societal and cultural implications. While prior research has primarily examined demographic and cultural biases, the ability of these models to accurately represent historical contexts remains largely underexplored. To address this gap, we introduce a benchmark for evaluating how TTI models depict historical contexts. The benchmark combines HistVis, a dataset of 30,000 synthetic images generated by three state-of-the-art diffusion models from carefully designed prompts covering universal human activities across multiple historical periods, with a reproducible evaluation protocol. We evaluate generated imagery across three key aspects: (1) Implicit Stylistic Associations: examining default visual styles associated with specific eras; (2) Historical Consistency: identifying anachronisms such as modern artifacts in pre-modern contexts; and (3) Demographic Representation: comparing generated racial and gender distributions against historically plausible baselines. Our findings reveal systematic inaccuracies in historically themed generated imagery, as TTI models frequently stereotype past eras by incorporating unstated stylistic cues, introduce anachronisms, and fail to reflect plausible demographic patterns. By providing a reproducible benchmark for historical representation in generated imagery, this work provides an initial step toward building more historically accurate TTI models.

Paper Structure

This paper contains 52 sections, 2 equations, 32 figures, 17 tables.

Figures (32)

  • Figure 1: Overview of the benchmark: (1) Prompt Design, (2) HistVis Dataset of synthetic images, and (3) Evaluation of stylistic bias, historical consistency, and demographic representation.
  • Figure 2: Examples of generated images reflecting different stylistic biases when a specific time period is added to the prompt “A person expressing joy after achieving a goal".
  • Figure 3: Examples of generated images with anachronisms identified by our two-stage method: headphones in the 18th, vacuum cleaner in the 19th-century, laptop in 1930s and smartphone in 1950s.
  • Figure 4: Top 15 anachronistic elements per model, ranked by frequency (x-axis) and severity (y-axis). Circles indicate elements in the top 15 by frequency, triangles by severity, and diamonds by both.
  • Figure 5: Examples of generated images illustrating demographic overrepresentation across models, time periods, and activities.
  • ...and 27 more figures