FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie, Qile He, Youjia Zhu, Qiwei He, Mengtian Li
TL;DR
This work tackles the challenge of producing high-quality, cinematically coherent film music for silent clips by introducing FilmComposer, an LLM-driven pipeline that imitates professional music workflows (spotting, composition, arrangement, mix). It fuses waveform and symbolic generation through three modules—visual processing, rhythm-controllable MusicGen, and a multi-agent assess/arrange/mix stage—to optimize audio quality, musicality, and development, while enabling user control. A dedicated dataset, MusicPro-7k, comprising ~7,418 film clips with descriptions, rhythms, and main melodies, underpins training and evaluation, complemented by novel metrics for musicality, development, and audiovisual alignment. Empirical results show state-of-the-art performance across quality, video correspondence, diversity, and musical development, with strong interactivity that supports seamless integration into real production pipelines and education.
Abstract
In this work, we implement music production for silent film clips using LLM-driven method. Given the strong professional demands of film music production, we propose the FilmComposer, simulating the actual workflows of professional musicians. FilmComposer is the first to combine large generative models with a multi-agent approach, leveraging the advantages of both waveform music and symbolic music generation. Additionally, FilmComposer is the first to focus on the three core elements of music production for film-audio quality, musicality, and musical development-and introduces various controls, such as rhythm, semantics, and visuals, to enhance these key aspects. Specifically, FilmComposer consists of the visual processing module, rhythm-controllable MusicGen, and multi-agent assessment, arrangement and mix. In addition, our framework can seamlessly integrate into the actual music production pipeline and allows user intervention in every step, providing strong interactivity and a high degree of creative freedom. Furthermore, we propose MusicPro-7k which includes 7,418 film clips, music, description, rhythm spots and main melody, considering the lack of a professional and high-quality film music dataset. Finally, both the standard metrics and the new specialized metrics we propose demonstrate that the music generated by our model achieves state-of-the-art performance in terms of quality, consistency with video, diversity, musicality, and musical development. Project page: https://apple-jun.github.io/FilmComposer.github.io/
