Table of Contents
Fetching ...

ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts

Vishruth Veerendranath, Vibha Masti, Utkarsh Gupta, Hrishit Chaudhuri, Gowri Srinivasa

TL;DR

ScripTONES addresses the challenge of generating film scores for scripts by a two-stage approach that first maps scene sentiment to the Valence-Arousal ($\mathit{VA}$) space using the NRC VAD lexicon, and then conditionally generates piano MIDI music with either a Transformer-based EMOPIA-CWT or a MusicVAE model augmented by attribute-vector arithmetic. It introduces continuous sentiment conditioning via latent vectors and discrete conditioning via quadrant labels, plus latent-space regularization (continuous and discrete) to improve alignment with sentiment. The authors validate the approach through a qualitative user study showing EMOPIA-CWT often yields more nuanced music, while MusicVAE enables fine-grained sentiment manipulation; they also demonstrate attribute-vector arithmetic and interpolation for evolving sentiment within scenes. Overall, the work provides a practical, controllable pipeline enabling low-cost, sentiment-aware music generation from movie scripts, with implications for indie and small-team productions.

Abstract

Film scores are considered an essential part of the film cinematic experience, but the process of film score generation is often expensive and infeasible for small-scale creators. Automating the process of film score composition would provide useful starting points for music in small projects. In this paper, we propose a two-stage pipeline for generating music from a movie script. The first phase is the Sentiment Analysis phase where the sentiment of a scene from the film script is encoded into the valence-arousal continuous space. The second phase is the Conditional Music Generation phase which takes as input the valence-arousal vector and conditionally generates piano MIDI music to match the sentiment. We study the efficacy of various music generation architectures by performing a qualitative user survey and propose methods to improve sentiment-conditioning in VAE architectures.

ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts

TL;DR

ScripTONES addresses the challenge of generating film scores for scripts by a two-stage approach that first maps scene sentiment to the Valence-Arousal () space using the NRC VAD lexicon, and then conditionally generates piano MIDI music with either a Transformer-based EMOPIA-CWT or a MusicVAE model augmented by attribute-vector arithmetic. It introduces continuous sentiment conditioning via latent vectors and discrete conditioning via quadrant labels, plus latent-space regularization (continuous and discrete) to improve alignment with sentiment. The authors validate the approach through a qualitative user study showing EMOPIA-CWT often yields more nuanced music, while MusicVAE enables fine-grained sentiment manipulation; they also demonstrate attribute-vector arithmetic and interpolation for evolving sentiment within scenes. Overall, the work provides a practical, controllable pipeline enabling low-cost, sentiment-aware music generation from movie scripts, with implications for indie and small-team productions.

Abstract

Film scores are considered an essential part of the film cinematic experience, but the process of film score generation is often expensive and infeasible for small-scale creators. Automating the process of film score composition would provide useful starting points for music in small projects. In this paper, we propose a two-stage pipeline for generating music from a movie script. The first phase is the Sentiment Analysis phase where the sentiment of a scene from the film script is encoded into the valence-arousal continuous space. The second phase is the Conditional Music Generation phase which takes as input the valence-arousal vector and conditionally generates piano MIDI music to match the sentiment. We study the efficacy of various music generation architectures by performing a qualitative user survey and propose methods to improve sentiment-conditioning in VAE architectures.
Paper Structure (25 sections, 5 equations, 5 figures, 1 table)

This paper contains 25 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Illustration of the ScripTONES pipeline, which consists of 2 major stages - Sentiment Analysis of Scripts and Conditional Music Generation. Music Generation is achieved either with MusicVAE with attribute vector arithmetic or EMOPIA-CWT
  • Figure 2: Original music (a) modified with Increased Valence (b) and Increased Arousal (c)
  • Figure 3: Discrete Regularization and loss plot while finetuning FIGARO on EMOPIA data
  • Figure 4: User preferences of music generation model and their music knowledge
  • Figure 5: Latent Space Interpolation