Table of Contents
Fetching ...

Evaluating Demographic Misrepresentation in Image-to-Image Portrait Editing

Huichan Seo, Minki Hong, Sieun Choi, Jihie Kim, Jean Oh

TL;DR

It is demonstrated that a prompt-level identity constraint, without model updates, can substantially reduce demographic change for minority groups while leaving majority-group portraits largely unchanged, revealing asymmetric identity priors in current editors.

Abstract

Demographic bias in text-to-image (T2I) generation is well studied, yet demographic-conditioned failures in instruction-guided image-to-image (I2I) editing remain underexplored. We examine whether identical edit instructions yield systematically different outcomes across subject demographics in open-weight I2I editors. We formalize two failure modes: Soft Erasure, where edits are silently weakened or ignored in the output image, and Stereotype Replacement, where edits introduce unrequested, stereotype-consistent attributes. We introduce a controlled benchmark that probes demographic-conditioned behavior by generating and editing portraits conditioned on race, gender, and age using a diagnostic prompt set, and evaluate multiple editors with vision-language model (VLM) scoring and human evaluation. Our analysis shows that identity preservation failures are pervasive, demographically uneven, and shaped by implicit social priors, including occupation-driven gender inference. Finally, we demonstrate that a prompt-level identity constraint, without model updates, can substantially reduce demographic change for minority groups while leaving majority-group portraits largely unchanged, revealing asymmetric identity priors in current editors. Together, our findings establish identity preservation as a central and demographically uneven failure mode in I2I editing and motivate demographic-robust editing systems. Project page: https://seochan99.github.io/i2i-demographic-bias

Evaluating Demographic Misrepresentation in Image-to-Image Portrait Editing

TL;DR

It is demonstrated that a prompt-level identity constraint, without model updates, can substantially reduce demographic change for minority groups while leaving majority-group portraits largely unchanged, revealing asymmetric identity priors in current editors.

Abstract

Demographic bias in text-to-image (T2I) generation is well studied, yet demographic-conditioned failures in instruction-guided image-to-image (I2I) editing remain underexplored. We examine whether identical edit instructions yield systematically different outcomes across subject demographics in open-weight I2I editors. We formalize two failure modes: Soft Erasure, where edits are silently weakened or ignored in the output image, and Stereotype Replacement, where edits introduce unrequested, stereotype-consistent attributes. We introduce a controlled benchmark that probes demographic-conditioned behavior by generating and editing portraits conditioned on race, gender, and age using a diagnostic prompt set, and evaluate multiple editors with vision-language model (VLM) scoring and human evaluation. Our analysis shows that identity preservation failures are pervasive, demographically uneven, and shaped by implicit social priors, including occupation-driven gender inference. Finally, we demonstrate that a prompt-level identity constraint, without model updates, can substantially reduce demographic change for minority groups while leaving majority-group portraits largely unchanged, revealing asymmetric identity priors in current editors. Together, our findings establish identity preservation as a central and demographically uneven failure mode in I2I editing and motivate demographic-robust editing systems. Project page: https://seochan99.github.io/i2i-demographic-bias
Paper Structure (99 sections, 3 equations, 13 figures, 26 tables)

This paper contains 99 sections, 3 equations, 13 figures, 26 tables.

Figures (13)

  • Figure 1: Qualitative examples of demographic-conditioned failures in I2I editing across different prompts and source demographics.
  • Figure 2: Examples of Soft Erasure and Stereotype Replacement
  • Figure 3: Overview of our study on demographic-conditioned failures in instruction-guided I2I portrait editing. We build a controlled benchmark from FairFace and pair source portraits with edit prompts. For each image--prompt pair, we run three I2I editing models to generate outputs. For diagnosing soft erasure and stereotype replacement, we evaluate $i_{\text{edit}}$; for feature prompt mitigation, we add a feature prompt $p_{\text{feat}}$ and re-run editing. Outputs are assessed via human evaluation and a VLM ensemble.
  • Figure 4: Racial disparities in (a) skin lightening and (b) race change. Indian and Black subjects experience 72--75% skin lightening vs. 44% for White and 54% for East Asian. Race change: Indian 14% vs. White 1%.
  • Figure 5: Qualitative comparison of baseline and ours. Feature prompts reduce race change for non-White subjects.
  • ...and 8 more figures