Table of Contents
Fetching ...

Nollywood: Let's Go to the Movies!

John E. Ortega, Ibrahim Said Ahmad, William Chen

TL;DR

The paper tackles the challenge of Nigerian English dialects in Nollywood by proposing a phonetic subtitle model to translate Nigerian English speech to American English and by applying state-of-the-art toxicity detectors to analyze speech content. It combines corpora from Nollywood films and the ICE-Nigeria dataset to assess both toxicity and automatic speech recognition across dialects, employing metrics such as the $WER$ and toxicity detectors like $ETOX$ and Seamless4MT. Key findings reveal low observed toxicity but substantial ASR difficulties for Nigerian English, with $WER$ markedly higher for Nigerian speech (e.g., over $90 ext{ extpercent}$ with Whisper and around $40 ext{ extpercent}$ with XLS-R on ICE), and even extreme values (> $100 ext{ extpercent}$) on some Deep Cut transcripts. The work highlights the need for dialect-aware ASR and cross-language toxicity methods, suggesting broader Nigerian-language data collection and targeted model adaptation to improve accessibility and content safety in low-resource language settings.

Abstract

Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to American English and (2) use the most advanced toxicity detectors to discover how toxic the speech is. Our aim is to highlight the text in these videos which is often times ignored for lack of dialectal understanding due the fact that many people in Nigeria speak a native language like Hausa at home.

Nollywood: Let's Go to the Movies!

TL;DR

The paper tackles the challenge of Nigerian English dialects in Nollywood by proposing a phonetic subtitle model to translate Nigerian English speech to American English and by applying state-of-the-art toxicity detectors to analyze speech content. It combines corpora from Nollywood films and the ICE-Nigeria dataset to assess both toxicity and automatic speech recognition across dialects, employing metrics such as the and toxicity detectors like and Seamless4MT. Key findings reveal low observed toxicity but substantial ASR difficulties for Nigerian English, with markedly higher for Nigerian speech (e.g., over with Whisper and around with XLS-R on ICE), and even extreme values (> ) on some Deep Cut transcripts. The work highlights the need for dialect-aware ASR and cross-language toxicity methods, suggesting broader Nigerian-language data collection and targeted model adaptation to improve accessibility and content safety in low-resource language settings.

Abstract

Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to American English and (2) use the most advanced toxicity detectors to discover how toxic the speech is. Our aim is to highlight the text in these videos which is often times ignored for lack of dialectal understanding due the fact that many people in Nigeria speak a native language like Hausa at home.
Paper Structure (14 sections, 4 figures, 2 tables)

This paper contains 14 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Four sentences used to create spectrograms for initial comparison between English spoken in Nigeria and the United States of America.
  • Figure 2: Spectrogram comparison of four sentences in English spoken by speakers from the USA and Nigeria.
  • Figure 3: Overview of the two ASR architectures, Whisper (left) and XLS-R (right).
  • Figure 4: Toxicity results for Deepcut and Acrimony datasets.