Table of Contents
Fetching ...

Enhancing Content Moderation with Culturally-Aware Models

Alex J. Chan, José Luis Redondo García, Fabrizio Silvestri, Colm O'Donnell, Konstantina Palla

TL;DR

This work introduces a flexible framework that enhances foundation language models with cultural knowledge and improves the accuracy of local violation detection and offer explanations that align more closely with regional cultural norms.

Abstract

Content moderation on a global scale must navigate a complex array of local cultural distinctions, which can hinder effective enforcement. While global policies aim for consistency and broad applicability, they often miss the subtleties of regional language interpretation, cultural beliefs, and local legislation. This work introduces a flexible framework that enhances foundation language models with cultural knowledge. Our approach involves fine-tuning encoder-decoder models on media-diet data to capture cultural nuances, and applies a continued training regime to effectively integrate these models into a content moderation pipeline. We evaluate this framework in a case study of an online podcast platform with content spanning various regions. The results show that our culturally adapted models improve the accuracy of local violation detection and offer explanations that align more closely with regional cultural norms. Our findings reinforce the need for an adaptable content moderation approach that remains flexible in response to the diverse cultural landscapes it operates in and represents a step towards a more equitable and culturally sensitive framework for content moderation, demonstrating what is achievable in this domain.

Enhancing Content Moderation with Culturally-Aware Models

TL;DR

This work introduces a flexible framework that enhances foundation language models with cultural knowledge and improves the accuracy of local violation detection and offer explanations that align more closely with regional cultural norms.

Abstract

Content moderation on a global scale must navigate a complex array of local cultural distinctions, which can hinder effective enforcement. While global policies aim for consistency and broad applicability, they often miss the subtleties of regional language interpretation, cultural beliefs, and local legislation. This work introduces a flexible framework that enhances foundation language models with cultural knowledge. Our approach involves fine-tuning encoder-decoder models on media-diet data to capture cultural nuances, and applies a continued training regime to effectively integrate these models into a content moderation pipeline. We evaluate this framework in a case study of an online podcast platform with content spanning various regions. The results show that our culturally adapted models improve the accuracy of local violation detection and offer explanations that align more closely with regional cultural norms. Our findings reinforce the need for an adaptable content moderation approach that remains flexible in response to the diverse cultural landscapes it operates in and represents a step towards a more equitable and culturally sensitive framework for content moderation, demonstrating what is achievable in this domain.
Paper Structure (41 sections, 14 figures)

This paper contains 41 sections, 14 figures.

Figures (14)

  • Figure 1: A culturally adaptive content moderation workflow. Language models attuned to local culture 1.) enhance automatic detection and explanation of violations by disentangling local nuances and 2.) aid in aligning human annotators with global guidelines by serving as a reasoning engine, identifying and addressing mismatches in cultural norms between the moderator and location.
  • Figure 2: Proposed Three-Step Framework for Cultural Media-diet Violative Detection Models. First, an encoder-decoder model is fine-tuned to summarise local popular news articles. Second, the pre-trained model is fine-tuned on generating moderator-written rationales of moderation decisions. Last, utilising the frozen encoder, a new classification head is trained to predict whether the content would be classified as a violation or not.'
  • Figure 3: Media-diet Model Performance. (Left): Heatmap showing normalised improvement of cultural media-diet models across test sets. A strong leading diagonal indicates each model making proportionally larger gains in their own culture. (Right): AUROC performance of the ten cultural-diet models and baseline on content from UK, USA, Canada and Australia.
  • Figure 4: Exploring the United States Model. Score distribution for (top) violative cases and (bottom) 'FYI' subset by the US model stratified to US and non-US origin cases.
  • Figure 5: Cultural Model Explanations. An example of how different cultural models explain an "FYI" lead created in the Australian market.
  • ...and 9 more figures