Table of Contents
Fetching ...

Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?

Zijian Zhao, Dian Jin, Zijing Zhou, Xiaoyu Zhang

TL;DR

This work reframes Automatic Stage Lighting Control as a generative, cross-modal generation task rather than a fixed rule-based mapping. It introduces Skip-BART, an end-to-end transformer model that outputs stage lighting hue and intensity from music, aided by a skip-connection to align frame-level music-light correspondences and transfer learning from PianoBART. A first-stage RPMC-L^2 dataset is released to enable learning from professional lighting data, along with pre-training, fine-tuning, and a Restricted Stochastic Temperature-Controlled sampling strategy for stable, diverse outputs. Quantitative and human evaluations show Skip-BART outperforms rule-based methods across metrics and closely approaches human lighting engineers, highlighting the potential for practical, data-driven, generative ASLC. The work also outlines limitations and avenues for future research, including real-time, multi-light control and richer multimodal signals.

Abstract

Stage lighting is a vital component in live music performances, shaping an engaging experience for both musicians and audiences. In recent years, Automatic Stage Lighting Control (ASLC) has attracted growing interest due to the high costs of hiring or training professional lighting engineers. However, most existing ASLC solutions only classify music into limited categories and map them to predefined light patterns, resulting in formulaic and monotonous outcomes that lack rationality. To address this gap, this paper presents Skip-BART, an end-to-end model that directly learns from experienced lighting engineers and predict vivid, human-like stage lighting. To the best of our knowledge, this is the first work to conceptualize ASLC as a generative task rather than merely a classification problem. Our method adapts the BART model to take audio music as input and produce light hue and value (intensity) as output, incorporating a novel skip connection mechanism to enhance the relationship between music and light within the frame grid. To address the lack of available datasets, we create the first stage lighting dataset, along with several pre-training and transfer learning techniques to improve model training with limited data. We validate our method through both quantitative analysis and an human evaluation, demonstrating that Skip-BART outperforms conventional rule-based methods across all evaluation metrics and shows only a limited gap compared to real lighting engineers. To support further research, we have made our self-collected dataset, code, and trained model parameters available at https://github.com/RS2002/Skip-BART .

Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?

TL;DR

This work reframes Automatic Stage Lighting Control as a generative, cross-modal generation task rather than a fixed rule-based mapping. It introduces Skip-BART, an end-to-end transformer model that outputs stage lighting hue and intensity from music, aided by a skip-connection to align frame-level music-light correspondences and transfer learning from PianoBART. A first-stage RPMC-L^2 dataset is released to enable learning from professional lighting data, along with pre-training, fine-tuning, and a Restricted Stochastic Temperature-Controlled sampling strategy for stable, diverse outputs. Quantitative and human evaluations show Skip-BART outperforms rule-based methods across metrics and closely approaches human lighting engineers, highlighting the potential for practical, data-driven, generative ASLC. The work also outlines limitations and avenues for future research, including real-time, multi-light control and richer multimodal signals.

Abstract

Stage lighting is a vital component in live music performances, shaping an engaging experience for both musicians and audiences. In recent years, Automatic Stage Lighting Control (ASLC) has attracted growing interest due to the high costs of hiring or training professional lighting engineers. However, most existing ASLC solutions only classify music into limited categories and map them to predefined light patterns, resulting in formulaic and monotonous outcomes that lack rationality. To address this gap, this paper presents Skip-BART, an end-to-end model that directly learns from experienced lighting engineers and predict vivid, human-like stage lighting. To the best of our knowledge, this is the first work to conceptualize ASLC as a generative task rather than merely a classification problem. Our method adapts the BART model to take audio music as input and produce light hue and value (intensity) as output, incorporating a novel skip connection mechanism to enhance the relationship between music and light within the frame grid. To address the lack of available datasets, we create the first stage lighting dataset, along with several pre-training and transfer learning techniques to improve model training with limited data. We validate our method through both quantitative analysis and an human evaluation, demonstrating that Skip-BART outperforms conventional rule-based methods across all evaluation metrics and shows only a limited gap compared to real lighting engineers. To support further research, we have made our self-collected dataset, code, and trained model parameters available at https://github.com/RS2002/Skip-BART .

Paper Structure

This paper contains 30 sections, 14 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Network Architecture: In the figure, 'ice' represents the frozen parameters, and 'fire' denotes the trainable ones.
  • Figure 2: Each subfigure shows the input scene on the top, the result of the direct HSV extraction method in the bottom-left, and our extraction method in the bottom-right. (a)-(b) Both methods accurately extract the dominant hue. (c)-(d) Our method extracts colors closer to the original appearance.
  • Figure 3: Workflow of Skip-BART
  • Figure 4: Visualization of a VMM Fitted to Hue Data: The method enables the decomposition of a mixed Hue distribution into several color components.
  • Figure 5: Visualization of lighting sequences generated by different methods. The top row shows the input Mel spectrogram, the middle row is the ground-truth lighting sequence, and the bottom row is the predicted sequence from the Skip-BART model. Each color represents a unique combination of lighting color and brightness over time. The red box in (b) highlights a representative segment where Skip-BART closely matches the temporal lighting structure of the ground truth.