Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4
Messi H. J. Lee
TL;DR
This study investigates homogeneity bias in Vision-Language Models, focusing on GPT-4o mini outputs generated from GANFD-signalized faces and evaluated via cosine similarity of sentence embeddings. Using a controlled experimental setup and mixed-effects models, we show that homogeneity bias largely persists across a range of hyperparameters (sampling temperature and top-p) and exhibits non-linear patterns, with racial bias and gender bias responding differently to parameter changes. While certain hyperparameter adjustments can mitigate racial bias to some extent, they do not provide a universal solution, and the differential responses across social dimensions underscore the need for comprehensive, bias-mitigation strategies beyond tuning alone. The findings highlight implications for practitioners and point to future work on non-linear parameter spaces, open-source model analysis, and broader task contexts to better understand and address homogeneity bias in AI systems.
Abstract
Vision-Language Models trained on massive collections of human-generated data often reproduce and amplify societal stereotypes. One critical form of stereotyping reproduced by these models is homogeneity bias-the tendency to represent certain groups as more homogeneous than others. We investigate how this bias responds to hyperparameter adjustments in GPT-4, specifically examining sampling temperature and top p which control the randomness of model outputs. By generating stories about individuals from different racial and gender groups and comparing their similarities using vector representations, we assess both bias robustness and its relationship with hyperparameter values. We find that (1) homogeneity bias persists across most hyperparameter configurations, with Black Americans and women being represented more homogeneously than White Americans and men, (2) the relationship between hyperparameters and group representations shows unexpected non-linear patterns, particularly at extreme values, and (3) hyperparameter adjustments affect racial and gender homogeneity bias differently-while increasing temperature or decreasing top p can reduce racial homogeneity bias, these changes show different effects on gender homogeneity bias. Our findings suggest that while hyperparameter tuning may mitigate certain biases to some extent, it cannot serve as a universal solution for addressing homogeneity bias across different social group dimensions.
