ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

Kevin Cai; Chonghua Liu; David M. Chan

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

Kevin Cai, Chonghua Liu, David M. Chan

TL;DR

The paper tackles the data bottleneck in automated video dubbing by introducing Anim-400K, a large-scale dataset of over 425K aligned JP–EN dubbed clips designed for end-to-end dubbing and a suite of secondary video tasks. It presents a bottom-up data collection and a top-down annotation pipeline to produce fully aligned audio, transcripts, speaker labels, and rich metadata, including backing tracks for realistic mixing. Beyond dubbing, Anim-400K supports video summarization, character identification, genre/theme/style classification, video quality analysis, and simultaneous translation, with EN subtitles overlapping JP audio to enable ST research. Publicly available and significantly larger than prior corpora, Anim-400K offers a valuable resource for advancing end-to-end dubbing research and related multimedia analysis, while also addressing ethical, cultural, and practical considerations inherent to large-scale synthetic audio generation.

Abstract

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18.8% are English speakers, and just 5.1% consider it their native language, leading to disparities in online information access. Unfortunately, automated processes for dubbing of video - replacing the audio track of a video with a translated alternative - remains a complex and challenging task due to pipelines, necessitating precise timing, facial movement synchronization, and prosody matching. While end-to-end dubbing offers a solution, data scarcity continues to impede the progress of both end-to-end and pipeline-based methods. In this work, we introduce Anim-400K, a comprehensive dataset of over 425K aligned animated video segments in Japanese and English supporting various video-related tasks, including automated dubbing, simultaneous translation, guided video summarization, and genre/theme/style classification. Our dataset is made publicly available for research purposes at https://github.com/davidmchan/Anim400K.

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

TL;DR

Abstract

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

Authors

TL;DR

Abstract

Table of Contents

Figures (2)