Table of Contents
Fetching ...

"I Searched for a Religious Song in Amharic and Got Sexual Content Instead": Investigating Online Harm in Low-Resourced Languages on YouTube

Hellina Hailu Nigatu, Inioluwa Deborah Raji

TL;DR

This work examines online harm on YouTube for Amharic-speaking users, a low-resource language context. It combines two studies—a qualitative interview study with 15 women and a platform-data analysis of 9313 search results, 3336 recommendations, 120 channels, and 406 comments—to reveal unplanned exposure to policy-violating sexual content and the under-performance of language technologies in moderation. The findings show that search and recommendation systems can propagate harmful content, while channels deploy advanced avoidance strategies and linguistic obfuscation to evade enforcement. The authors discuss implications for platform design, NGO initiatives, and government regulation to better protect marginalized language communities and reduce online harm in low-resource settings.

Abstract

Online social media platforms such as YouTube have a wide, global reach. However, little is known about the experience of low-resourced language speakers on such platforms; especially in how they experience and navigate harmful content. To better understand this, we (1) conducted semi-structured interviews (n=15) and (2) analyzed search results (n=9313), recommendations (n=3336), channels (n=120) and comments (n=406) of policy-violating sexual content on YouTube focusing on the Amharic language. Our findings reveal that -- although Amharic-speaking YouTube users find the platform crucial for several aspects of their lives -- participants reported unplanned exposure to policy-violating sexual content when searching for benign, popular queries. Furthermore, malicious content creators seem to exploit under-performing language technologies and content moderation to further target vulnerable groups of speakers, including migrant domestic workers, diaspora, and local Ethiopians. Overall, our study sheds light on how failures in low-resourced language technology may lead to exposure to harmful content and suggests implications for stakeholders in minimizing harm. Content Warning: This paper includes discussions of NSFW topics and harmful content (hate, abuse, sexual harassment, self-harm, misinformation). The authors do not support the creation or distribution of harmful content.

"I Searched for a Religious Song in Amharic and Got Sexual Content Instead": Investigating Online Harm in Low-Resourced Languages on YouTube

TL;DR

This work examines online harm on YouTube for Amharic-speaking users, a low-resource language context. It combines two studies—a qualitative interview study with 15 women and a platform-data analysis of 9313 search results, 3336 recommendations, 120 channels, and 406 comments—to reveal unplanned exposure to policy-violating sexual content and the under-performance of language technologies in moderation. The findings show that search and recommendation systems can propagate harmful content, while channels deploy advanced avoidance strategies and linguistic obfuscation to evade enforcement. The authors discuss implications for platform design, NGO initiatives, and government regulation to better protect marginalized language communities and reduce online harm in low-resource settings.

Abstract

Online social media platforms such as YouTube have a wide, global reach. However, little is known about the experience of low-resourced language speakers on such platforms; especially in how they experience and navigate harmful content. To better understand this, we (1) conducted semi-structured interviews (n=15) and (2) analyzed search results (n=9313), recommendations (n=3336), channels (n=120) and comments (n=406) of policy-violating sexual content on YouTube focusing on the Amharic language. Our findings reveal that -- although Amharic-speaking YouTube users find the platform crucial for several aspects of their lives -- participants reported unplanned exposure to policy-violating sexual content when searching for benign, popular queries. Furthermore, malicious content creators seem to exploit under-performing language technologies and content moderation to further target vulnerable groups of speakers, including migrant domestic workers, diaspora, and local Ethiopians. Overall, our study sheds light on how failures in low-resourced language technology may lead to exposure to harmful content and suggests implications for stakeholders in minimizing harm. Content Warning: This paper includes discussions of NSFW topics and harmful content (hate, abuse, sexual harassment, self-harm, misinformation). The authors do not support the creation or distribution of harmful content.
Paper Structure (37 sections, 9 figures, 6 tables)

This paper contains 37 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: CW: Discussion of sexual content. In Fig. we present a screenshot for the search results for the query "Doctor" written in Ge'ez script. The first four results are from a medical YouTube channel by Ethiopian doctors, an entertainment talk show, a news channel, and a talk show featuring a psychiatric doctor. Then, the fifth result (highlighted in red box) is a sexual video with a picture of an Ethiopian girl and explicit sexual writing on the thumbnail. The title says 'He made me [EXPLICIT WORD] 7 times'. The video is from a channel that has a name that starts with "Dr." Similarly, in Fig. \ref{['fig:tvshow']}, the first three results for a famous TV show are episodes of the TV show, and the fourth result (highlighted in red box), is a sexual video with similar characteristics as the one in Fig. \ref{['fig:docgeez']}.
  • Figure 2: CW: Discussion of sexual content. Scrolling down the search results for a famous Ethiopian celebrity. Sexual videos are not limited to direct responses to search queries. Here, we found a sexual video that has two people engaged in a sexual act on a sofa with the title indicating a person cheating on her husband with a satellite dish maintenance person in the "People also watched" section.
  • Figure 3: CW: Discussion of sexual content. Screenshot showing recommendations for one of the sexual videos opened to collect recommendation data. This video is from a verified channel that exclusively posts sexual videos in Amharic. All the recommendations for the one video opened from this channel are other sexual videos all from the same channel. The videos would have an image of a woman, often Ethiopian, and explicit sexual writing in Amharic. The channel also uses a 'Dr.' title in their channel name.
  • Figure 4: CW: Discussion of sexual content. In trying to label some videos from the collected data, we searched the titles of some videos on the YouTube interface. The results for the Ge'ez-based titles would sometimes return graphically explicit videos. In this case, the search query included the word "GOOD" which phonetically is similar to a word in Amharic used to describe astonishment. The fourth and fifth search results (highlighted in red boxes) were playlists with the title "Good" but had video thumbnails of actual pornographic videos.
  • Figure 5: CW: Discussion of sexual content. Fig. \ref{['fig:recs']} shows a screenshot of the recommendation list for an Amharic sexual video we opened for data collection. The video has explicit Amharic writing and uses lexical variation by mixing Ge'ez and Latin characters. Fig. \ref{['fig:animals']} shows screenshots of recommended videos as we scroll down the recommendations for an Amharic sexual video. Recommendations include sexual videos in other languages (highlighted in teal), sexual scenes cut from movies (highlighted in pink), Amharic sexual videos from other channels (highlighted in yellow), and videos of animals mating (highlighted in red).
  • ...and 4 more figures