Table of Contents
Fetching ...

Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures

Akhila Yerukola, Saadia Gabriel, Nanyun Peng, Maarten Sap

TL;DR

This work introduces MC-SIGNS, a cross-cultural gesture dataset with 288 gesture-country pairs across 25 gestures and 85 countries, annotated for offensiveness, cultural significance, and context. It systematically evaluates AI systems across text-to-image, language, and vision-language modalities, revealing pervasive US-centric biases, over-flagging by LLMs, and context-sensitive safety gaps that worsen with scene descriptions. Implicit-meaning analyses show models default to US interpretations even for universal concepts, underscoring the need for culturally aware safety frameworks and regionally informed data. The authors release MC-SIGNS and accompanying code to propel research on inclusive, culturally safe AI deployment in global applications.

Abstract

Gestures are an integral part of non-verbal communication, with meanings that vary across cultures, and misinterpretations that can have serious social and diplomatic consequences. As AI systems become more integrated into global applications, ensuring they do not inadvertently perpetuate cultural offenses is critical. To this end, we introduce Multi-Cultural Set of Inappropriate Gestures and Nonverbal Signs (MC-SIGNS), a dataset of 288 gesture-country pairs annotated for offensiveness, cultural significance, and contextual factors across 25 gestures and 85 countries. Through systematic evaluation using MC-SIGNS, we uncover critical limitations: text-to-image (T2I) systems exhibit strong US-centric biases, performing better at detecting offensive gestures in US contexts than in non-US ones; large language models (LLMs) tend to over-flag gestures as offensive; and vision-language models (VLMs) default to US-based interpretations when responding to universal concepts like wishing someone luck, frequently suggesting culturally inappropriate gestures. These findings highlight the urgent need for culturally-aware AI safety mechanisms to ensure equitable global deployment of AI technologies.

Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures

TL;DR

This work introduces MC-SIGNS, a cross-cultural gesture dataset with 288 gesture-country pairs across 25 gestures and 85 countries, annotated for offensiveness, cultural significance, and context. It systematically evaluates AI systems across text-to-image, language, and vision-language modalities, revealing pervasive US-centric biases, over-flagging by LLMs, and context-sensitive safety gaps that worsen with scene descriptions. Implicit-meaning analyses show models default to US interpretations even for universal concepts, underscoring the need for culturally aware safety frameworks and regionally informed data. The authors release MC-SIGNS and accompanying code to propel research on inclusive, culturally safe AI deployment in global applications.

Abstract

Gestures are an integral part of non-verbal communication, with meanings that vary across cultures, and misinterpretations that can have serious social and diplomatic consequences. As AI systems become more integrated into global applications, ensuring they do not inadvertently perpetuate cultural offenses is critical. To this end, we introduce Multi-Cultural Set of Inappropriate Gestures and Nonverbal Signs (MC-SIGNS), a dataset of 288 gesture-country pairs annotated for offensiveness, cultural significance, and contextual factors across 25 gestures and 85 countries. Through systematic evaluation using MC-SIGNS, we uncover critical limitations: text-to-image (T2I) systems exhibit strong US-centric biases, performing better at detecting offensive gestures in US contexts than in non-US ones; large language models (LLMs) tend to over-flag gestures as offensive; and vision-language models (VLMs) default to US-based interpretations when responding to universal concepts like wishing someone luck, frequently suggesting culturally inappropriate gestures. These findings highlight the urgent need for culturally-aware AI safety mechanisms to ensure equitable global deployment of AI technologies.

Paper Structure

This paper contains 70 sections, 49 figures, 12 tables.

Figures (49)

  • Figure 1: Interpretations of gestures varies dramatically across regions and cultures. "Crossing your fingers", while commonly used in the US to wish for good luck, can be considered deeply offensive to female audiences in parts of Vietnam. AI systems, such as T2I models, should be culturally competent and avoid generating visual elements that risk miscommunication or offense in specific cultural contexts.
  • Figure 2: RQ1: LLM Offensiveness classification shows high recall, low specificity, and a tendency to over-flag gestures as offensive.
  • Figure 3: RQ1: T2I Imagen-3 detects offensive gestures better, while DALLE-3 prioritizes avoiding false rejections (high specificity) at the cost of safety. Scene descriptions weakens safety filters.
  • Figure 4: RQ1: VLM Offensiveness classification varies, with some models performing at random chance and others over-flagging gestures, shown by high recall and low specificity.
  • Figure 5: Accuracy comparison of DALLE-3 and Imagen 3 in identifying offensive gestures across US and non-US contexts. DALLE-3 struggles in non-US contexts while performing moderately in US contexts. Imagen 3 shows high accuracy overall but shows a performance drop in non-US-offensive gestures.
  • ...and 44 more figures