ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers
Alexander Scarlatos, Yusong Wu, Ian Simon, Adam Roberts, Tim Cooijmans, Natasha Jaques, Cassie Tarakajian, Cheng-Zhi Anna Huang
TL;DR
ReaLJam tackles real-time human-AI jamming by enabling low-latency collaboration between a Transformer-based chord-generation agent and a human musician. The system combines an RL-tuned ReaLchords model with an anticipation-driven interface and a robust client-server protocol that visualizes the agent's plan via a waterfall display. A user study with six experienced musicians shows the approach yields enjoyable, musically interesting performances and highlights the importance of RL training and high user control over interface settings. The work provides a viable blueprint for real-time generative accompaniment and offers design guidance for future interactive AI music systems.
Abstract
Recent advances in generative artificial intelligence (AI) have created models capable of high-quality musical content generation. However, little consideration is given to how to use these models for real-time or cooperative jamming musical applications because of crucial required features: low latency, the ability to communicate planned actions, and the ability to adapt to user input in real-time. To support these needs, we introduce ReaLJam, an interface and protocol for live musical jamming sessions between a human and a Transformer-based AI agent trained with reinforcement learning. We enable real-time interactions using the concept of anticipation, where the agent continually predicts how the performance will unfold and visually conveys its plan to the user. We conduct a user study where experienced musicians jam in real-time with the agent through ReaLJam. Our results demonstrate that ReaLJam enables enjoyable and musically interesting sessions, and we uncover important takeaways for future work.
