Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa Garcia, Yuta Nakashima
TL;DR
The paper addresses whether deep generative models amplify social biases in future vision-language systems by simulating dataset contamination: progressively replacing real COCO/CC3M images with Stable Diffusion outputs and assessing two downstream tasks (OpenCLIP-based image-text pretraining and image captioning). Using a contamination parameter $\alpha$ and the distribution $\mathcal{D}(\alpha) = \{ x \sim (1-\alpha)p_{\mathcal{I}}(x) + \alpha p_{\mathcal{G}}(x)\}$, the study analyzes bias across gender, ethnicity, age, and skin tone via metrics like R@k, LIC, and gender misprediction. The results show no consistent bias amplification; biases can either mitigate or amplify depending on task, attribute, and data source, with underlying causes linked to pre-existing dataset biases and generation artefacts such as blurry faces and stereotyping. The work highlights the nuanced and non-uniform impact of synthetic data on fairness in VL models and offers bias-aware recommendations for data collection and evaluation, while noting limitations due to scale and model choice. Overall, synthetic data can influence bias in complex, task-specific ways, reinforcing the need for careful bias-filtering and systematic evaluation in future model development.
Abstract
We investigate the impact of deep generative models on potential social biases in upcoming computer vision models. As the internet witnesses an increasing influx of AI-generated images, concerns arise regarding inherent biases that may accompany them, potentially leading to the dissemination of harmful content. This paper explores whether a detrimental feedback loop, resulting in bias amplification, would occur if generated images were used as the training data for future models. We conduct simulations by progressively substituting original images in COCO and CC3M datasets with images generated through Stable Diffusion. The modified datasets are used to train OpenCLIP and image captioning models, which we evaluate in terms of quality and bias. Contrary to expectations, our findings indicate that introducing generated images during training does not uniformly amplify bias. Instead, instances of bias mitigation across specific tasks are observed. We further explore the factors that may influence these phenomena, such as artifacts in image generation (e.g., blurry faces) or pre-existing biases in the original datasets.
