AILive Mixer: A Deep Learning based Zero Latency Automatic Music Mixer for Live Music Performances

Devansh Zurale; Iris Lorente; Michael Lester; Alex Mitchell

AILive Mixer: A Deep Learning based Zero Latency Automatic Music Mixer for Live Music Performances

Devansh Zurale, Iris Lorente, Michael Lester, Alex Mitchell

Abstract

In this work, we present a deep learning-based automatic multitrack music mixing system catered towards live performances. In a live performance, channels are often corrupted with acoustic bleeds of co-located instruments. Moreover, audio-visual synchronization is of critical importance thus putting a tight constraint on the audio latency. In this work we primarily tackle these two challenges of handling bleeds in the input channels to produce the music mix with zero latency. Although there have been several developments in the field of automatic music mixing in recent times, most or all previous works focus on offline production for isolated instrument signals and to the best of our knowledge, this is the first end-to-end deep learning system developed for live music performances. Our proposed system currently predicts mono gains for a multitrack input, but its design along with the precedent set in past works, allows for easy adaptation to future work of predicting other relevant music mixing parameters.

AILive Mixer: A Deep Learning based Zero Latency Automatic Music Mixer for Live Music Performances

Abstract

Paper Structure (13 sections, 4 figures)

This paper contains 13 sections, 4 figures.

Introduction
AiLive Mixer System Design
Model Architecture
Multi-Rate (MR) Processing
Zero Latency Processing
How ALM differs from DMC
Training Data & Data Augmentation
Training Methods
Experiments
Results & Discussion
Audio Results Discussion
Conclusion
Acknowledgements

Figures (4)

Figure 1: AiLive Mixer - System Overview. Blue blocks to the left of the RATE-SPLIT-LINE operate with a frame size of $F1 = 975$ ms. Orange blocks to the right of the line operate with a frame size of F2. In MR mode, $F2 = 50$ ms, in SR mode $F2 = F1 = 975$ ms.
Figure 2: Multi-Rate Processing for a single channel.
Figure 3: Violin $+$ Box: Absolute Ratings
Figure 4: Violin $+$ Box: Ratings normalized per song per person

AILive Mixer: A Deep Learning based Zero Latency Automatic Music Mixer for Live Music Performances

Abstract

AILive Mixer: A Deep Learning based Zero Latency Automatic Music Mixer for Live Music Performances

Authors

Abstract

Table of Contents

Figures (4)