FocusedAD: Character-centric Movie Audio Description

Xiaojun Ye; Chun Wang; Yiren Song; Sheng Zhou; Liangcheng Li; Jiajun Bu

FocusedAD: Character-centric Movie Audio Description

Xiaojun Ye, Chun Wang, Yiren Song, Sheng Zhou, Liangcheng Li, Jiajun Bu

TL;DR

FocusedAD tackles automatic movie audio description by generating character-centric narration with explicit name references and narrative relevance. It introduces a Character Perception Module to identify and track main characters, a Dynamic Prior Module that injects context from prior ADs and subtitles via learnable soft prompts, and a Focused Caption Module that fuses scene, character, and text tokens through an LLM to produce concise, story-aware descriptions. An automated pipeline builds a robust character query bank to address identity recognition across appearance changes. The approach achieves state-of-the-art performance, including strong zero-shot results on MAD-eval-Named and Cinepile-AD, demonstrating improved narrative coherence and audience accessibility for BVI users.

Abstract

Movie Audio Description (AD) aims to narrate visual content during dialogue-free segments, particularly benefiting blind and visually impaired (BVI) audiences. Compared with general video captioning, AD demands plot-relevant narration with explicit character name references, posing unique challenges in movie understanding.To identify active main characters and focus on storyline-relevant regions, we propose FocusedAD, a novel framework that delivers character-centric movie audio descriptions. It includes: (i) a Character Perception Module(CPM) for tracking character regions and linking them to names; (ii) a Dynamic Prior Module(DPM) that injects contextual cues from prior ADs and subtitles via learnable soft prompts; and (iii) a Focused Caption Module(FCM) that generates narrations enriched with plot-relevant details and named characters. To overcome limitations in character identification, we also introduce an automated pipeline for building character query banks. FocusedAD achieves state-of-the-art performance on multiple benchmarks, including strong zero-shot results on MAD-eval-Named and our newly proposed Cinepile-AD dataset. Code and data will be released at https://github.com/Thorin215/FocusedAD .

FocusedAD: Character-centric Movie Audio Description

TL;DR

Abstract

FocusedAD: Character-centric Movie Audio Description

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)