Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
Carlo Bretti, Pascal Mettes, Hendrik Vincent Koops, Daan Odijk, Nanne van Noord
TL;DR
The paper tackles the challenge of predicting trailerness in long-form soap operas to aid editors in trailer creation. It introduces a multi-modal, multi-scale Trailerness Transformer that processes visual and textual signals at clip- and shot-level scales, trained with editor-derived labels from the GTST dataset. The study shows that combining modalities and scales yields higher trailerness predictions, achieving a best F1 around 9.2% on GTST and outperforming baselines like random, MLP, and frame-based approaches. By releasing the GTST dataset and code, the work provides a practical, open pathway for improving trailer generation in soap operas and other long-form content.
Abstract
Creating a trailer requires carefully picking out and piecing together brief enticing moments out of a longer video, making it a challenging and time-consuming task. This requires selecting moments based on both visual and dialogue information. We introduce a multi-modal method for predicting the trailerness to assist editors in selecting trailer-worthy moments from long-form videos. We present results on a newly introduced soap opera dataset, demonstrating that predicting trailerness is a challenging task that benefits from multi-modal information. Code is available at https://github.com/carlobretti/cliffhanger
