Towards Automated Movie Trailer Generation
Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem
TL;DR
This work introduces Trailer Generation Transformer (TGT), an autoregressive encoder-decoder framework that translates full movies into plausible trailers by modeling shot sequences with a trailerness-aware encoder and a context-aware Transformer decoder. By training on paired movie-trailer data and employing reconstruction, trailerness, and KL-divergence losses, TGT learns both which shots to include and how to order them, achieving non-chronological, narrative-driven trailers. The authors construct two ATG benchmarks on MAD and MovieNet, showing that TGT outperforms prior trailer-generation and video-summarization methods across multiple metrics, including F1, LD, and SLD, and demonstrate benefits of text-conditioned generation and shot-selection analysis. The work highlights practical implications for automating initial trailer assembly while preserving editorial flexibility, and proposes future extensions to incorporate dialogue and audio modeling for even more realistic trailers.
Abstract
Movie trailers are an essential tool for promoting films and attracting audiences. However, the process of creating trailers can be time-consuming and expensive. To streamline this process, we propose an automatic trailer generation framework that generates plausible trailers from a full movie by automating shot selection and composition. Our approach draws inspiration from machine translation techniques and models the movies and trailers as sequences of shots, thus formulating the trailer generation problem as a sequence-to-sequence task. We introduce Trailer Generation Transformer (TGT), a deep-learning framework utilizing an encoder-decoder architecture. TGT movie encoder is tasked with contextualizing each movie shot representation via self-attention, while the autoregressive trailer decoder predicts the feature representation of the next trailer shot, accounting for the relevance of shots' temporal order in trailers. Our TGT significantly outperforms previous methods on a comprehensive suite of metrics.
