Table of Contents
Fetching ...

The erasure of intensive livestock farming in text-to-image generative AI

Kehan Sheng, Frank A. M. Tuyttens, Marina A. G. von Keyserlingk

TL;DR

This study investigates how DALL-E 3, as integrated with ChatGPT, represents intensive livestock farming and how automatic prompt revision shapes these depictions. By generating 4,800 images across 48 prompts for dairy and pig farms, the authors compare default outputs, explicitly realistic prompts, and no-revision conditions, using both manual review and GPT-4o-assisted labeling. They find that prompt revision biases outputs toward pastoral imagery, while disabling revision reveals more accurate indoor housing patterns, with regional variations aligning with real-world practices in some cases. The work highlights ethical and societal implications for AI transparency, trust, and animal welfare discourse, and cautions about synthetic-data spill and model-collapse risks in future AI systems. The study also provides a data and code framework to audit AI depictions of animal agriculture, supporting ongoing scrutiny of AI-guided representations in public discourse.

Abstract

Generative AI (e.g., ChatGPT) is increasingly integrated into people's daily lives. While it is known that AI perpetuates biases against marginalized human groups, their impact on non-human animals remains understudied. We found that ChatGPT's text-to-image model (DALL-E 3) introduces a strong bias toward romanticizing livestock farming as dairy cows on pasture and pigs rooting in mud. This bias remained when we requested realistic depictions and was only mitigated when the automatic prompt revision was inhibited. Most farmed animal in industrialized countries are reared indoors with limited space per animal, which fail to resonate with societal values. Inhibiting prompt revision resulted in images that more closely reflected modern farming practices; for example, cows housed indoors accessing feed through metal headlocks, and pigs behind metal railings on concrete floors in indoor facilities. While OpenAI introduced prompt revision to mitigate bias, in the case of farmed animal production systems, it paradoxically introduces a strong bias towards unrealistic farming practices.

The erasure of intensive livestock farming in text-to-image generative AI

TL;DR

This study investigates how DALL-E 3, as integrated with ChatGPT, represents intensive livestock farming and how automatic prompt revision shapes these depictions. By generating 4,800 images across 48 prompts for dairy and pig farms, the authors compare default outputs, explicitly realistic prompts, and no-revision conditions, using both manual review and GPT-4o-assisted labeling. They find that prompt revision biases outputs toward pastoral imagery, while disabling revision reveals more accurate indoor housing patterns, with regional variations aligning with real-world practices in some cases. The work highlights ethical and societal implications for AI transparency, trust, and animal welfare discourse, and cautions about synthetic-data spill and model-collapse risks in future AI systems. The study also provides a data and code framework to audit AI depictions of animal agriculture, supporting ongoing scrutiny of AI-guided representations in public discourse.

Abstract

Generative AI (e.g., ChatGPT) is increasingly integrated into people's daily lives. While it is known that AI perpetuates biases against marginalized human groups, their impact on non-human animals remains understudied. We found that ChatGPT's text-to-image model (DALL-E 3) introduces a strong bias toward romanticizing livestock farming as dairy cows on pasture and pigs rooting in mud. This bias remained when we requested realistic depictions and was only mitigated when the automatic prompt revision was inhibited. Most farmed animal in industrialized countries are reared indoors with limited space per animal, which fail to resonate with societal values. Inhibiting prompt revision resulted in images that more closely reflected modern farming practices; for example, cows housed indoors accessing feed through metal headlocks, and pigs behind metal railings on concrete floors in indoor facilities. While OpenAI introduced prompt revision to mitigate bias, in the case of farmed animal production systems, it paradoxically introduces a strong bias towards unrealistic farming practices.

Paper Structure

This paper contains 26 sections, 24 figures, 3 tables.

Figures (24)

  • Figure 1: Comparison of DALL-E 3 generated images for default depiction (“basic” prompts) versus when prompt revision is disabled (“no revise” variants). Each panel shows the original prompt, common terms from auto-revised prompts, a randomly drawn sample image, and frequent terms from GPT-4o’s text descriptions of the images. Word clouds are omitted for “no revise” prompts as prompt-revisions were successfully inhibited for 100% of dairy farms and 99% of pig farms.
  • Figure 2: 3D bar plots showing the percentages of images depicting animals on pasture/mud (green) or exclusively indoors (blue) when DALL-E 3 was prompted for dairy farms (A, B) and pig farms (C, D). 95% confidence intervals are shown using orange bars. Note that confidence intervals are not shown for bars reaching 0% or 100% since no statistical uncertainty exists. Three prompt categories were tested: ‘basic’ (“A {farm type}”; where {farm type} is replaced with either “dairy farm” or “pig farm”), ‘typical’ (“A typical {farm type}”), and ‘reality’ (“Please create an image that accurately represents the reality of what most {farm type}s look like”). The “revise” notation in the plot refers to images generated when DALL-E 3 by default revised user prompts. For each prompt category, a “no revise” variant was also tested by appending “I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:” to inhibit automatic prompt revision. Images that could not be clearly categorized as indoor or outdoor housing were excluded from the analysis. Three randomly selected example images are shown adjacent to each bar plot.
  • Figure 3: 3D bar plots showing the percentages of images depicting animals on pasture/mud (green) or exclusively indoors (blue) when DALL-E 3 was prompted to generate dairy farms in the United States (U.S.), Germany, and New Zealand. 95% confidence intervals are shown using orange bars. Note that confidence intervals are not shown for bars reaching 0% or 100% since no statistical uncertainty exists. Three prompt categories were tested: ‘basic’ (“A dairy farm in {country}”) (A, B), ‘typical’ (“A typical dairy farm in {country}”) (C, D), and ‘reality’ (“Please create an image that accurately represents the reality of what most dairy farms in {country} look like”) (E, F). The “revise” notation in the plot refers to images generated when DALL-E 3 by default revised user prompts. For each prompt category and country, a “no revise” variant to inhibit automatic prompt revision was also tested. Images that could not be clearly categorized as indoor or outdoor housing were excluded from the analysis. Three randomly selected example images are shown adjacent to each bar plot, with one image per country (ordered from top to bottom: U.S., Germany, New Zealand).
  • Figure 4: 3D bar plots showing the percentages of images depicting animals on pasture/mud (green) or exclusively indoors (blue) when DALL-E 3 was prompted to generate pig farms in the United States (U.S.), Spain, and Australia. 95% confidence intervals are shown using orange bars. Note that confidence intervals are not shown for bars reaching 0% or 100% since no statistical uncertainty exists. Three prompt categories were tested: ‘basic’ (“A pig farm in {country}”) (A, B), ‘typical’ (“A typical pig farm in {country}”) (C, D), and ‘reality’ (“Please create an image that accurately represents the reality of what most pig farms in {country} look like”) (E, F). The “revise” notation in the plot refers to images generated when DALL-E 3 by default revised user prompts. For each prompt category and country, a “no revise” variant to inhibit automatic prompt revision was also tested. Images that could not be clearly categorized as indoor or outdoor housing were excluded from the analysis. Three randomly selected example images are shown adjacent to each bar plot, with one image per country (ordered from top to bottom: U.S., Spain, Australia).
  • Figure A.1: Comparison of DALL-E 3's outputs for “typical” prompts (“A typical {farm type}”) versus prompts with “no revise” instruction (grey panels). Each panel shows the original prompt, frequent word pairs from auto-revised prompts, a representative generated image, and frequent word pairs from GPT-4o's text descriptions for all images. Word clouds are omitted for “no revise” prompts since all auto-revision were successfully inhibited, resulting in a uniform revised prompt output of “A typical {farm type}” across all generations.
  • ...and 19 more figures