Nollywood: Let's Go to the Movies!

John E. Ortega; Ibrahim Said Ahmad; William Chen

Nollywood: Let's Go to the Movies!

John E. Ortega, Ibrahim Said Ahmad, William Chen

TL;DR

The paper tackles the challenge of Nigerian English dialects in Nollywood by proposing a phonetic subtitle model to translate Nigerian English speech to American English and by applying state-of-the-art toxicity detectors to analyze speech content. It combines corpora from Nollywood films and the ICE-Nigeria dataset to assess both toxicity and automatic speech recognition across dialects, employing metrics such as the $WER$ and toxicity detectors like $ETOX$ and Seamless4MT. Key findings reveal low observed toxicity but substantial ASR difficulties for Nigerian English, with $WER$ markedly higher for Nigerian speech (e.g., over $90 ext{ extpercent}$ with Whisper and around $40 ext{ extpercent}$ with XLS-R on ICE), and even extreme values (> $100 ext{ extpercent}$) on some Deep Cut transcripts. The work highlights the need for dialect-aware ASR and cross-language toxicity methods, suggesting broader Nigerian-language data collection and targeted model adaptation to improve accessibility and content safety in low-resource language settings.

Abstract

Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to American English and (2) use the most advanced toxicity detectors to discover how toxic the speech is. Our aim is to highlight the text in these videos which is often times ignored for lack of dialectal understanding due the fact that many people in Nigeria speak a native language like Hausa at home.

Nollywood: Let's Go to the Movies!

TL;DR

and toxicity detectors like

and Seamless4MT. Key findings reveal low observed toxicity but substantial ASR difficulties for Nigerian English, with

markedly higher for Nigerian speech (e.g., over

with Whisper and around

with XLS-R on ICE), and even extreme values (>

) on some Deep Cut transcripts. The work highlights the need for dialect-aware ASR and cross-language toxicity methods, suggesting broader Nigerian-language data collection and targeted model adaptation to improve accessibility and content safety in low-resource language settings.

Abstract

Paper Structure (14 sections, 4 figures, 2 tables)

This paper contains 14 sections, 4 figures, 2 tables.

Introduction
Related Work
Methodology
Corpora
Toxicity
Toxicity Metric
Seamless4MT
ETOX
Automatic Speech Recognition
Results
Toxicity
ASR
Conclusion
Future Work

Figures (4)

Figure 1: Four sentences used to create spectrograms for initial comparison between English spoken in Nigeria and the United States of America.
Figure 2: Spectrogram comparison of four sentences in English spoken by speakers from the USA and Nigeria.
Figure 3: Overview of the two ASR architectures, Whisper (left) and XLS-R (right).
Figure 4: Toxicity results for Deepcut and Acrimony datasets.

Nollywood: Let's Go to the Movies!

TL;DR

Abstract

Nollywood: Let's Go to the Movies!

Authors

TL;DR

Abstract

Table of Contents

Figures (4)