Evaluating Perspectival Biases in Cross-Modal Retrieval
Teerapol Saengsukhiran, Peerawat Chomphooyod, Narabodee Rodjananant, Chompakorn Chaksangchaichot, Patawee Prakrankamanant, Witthawin Sripheanpol, Pak Lovichit, Sarana Nutanong, Ekapol Chuangsuwanich
TL;DR
The paper tackles perspectival biases in cross-modal retrieval, identifying two distinct forms: prevalence bias in image-to-text retrieval and association bias in text-to-image retrieval. It introduces $DLBKL$, a rank-aware extension of $LBKL$, and the 3XCM benchmark with the Self-Preference Cultural Bias Score (SP) to quantify these biases. Through experiments across diverse model families (dense vision-language retrievers, cross-lingual alignments, and multilingual LLM-based embedders), the study shows explicit cross-lingual alignment markedly reduces both biases, while association bias remains challenging. The work provides practical metrics and datasets to evaluate fairness in multilingual multimodal systems and calls for training strategies that enforce global semantic mappings beyond data scale alone.
Abstract
Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically reflect perspectival biases: deviations shaped by linguistic prevalence and cultural associations. We study two such biases. First, prevalence bias refers to the tendency to favor entries from prevalent languages over semantically faithful entries in image-to-text retrieval. Second, association bias refers to the tendency to favor images culturally associated with the query over semantically correct ones in text-to-image retrieval. Results show that explicit alignment is a more effective strategy for mitigating prevalence bias. However, association bias remains a distinct and more challenging problem. These findings suggest that achieving truly equitable multimodal systems requires targeted strategies beyond simple data scaling and that bias arising from cultural association may be treated as a more challenging problem than one arising from linguistic prevalence.
