Table of Contents
Fetching ...

Augmenting Online Meetings with Context-Aware Real-time Music Generation

Haruki Suzawa, Ko Watanabe, Andreas Dengel, Shoya Ishimaru

TL;DR

This work investigates using GenAI to generate context-aware background music during online meetings to counter cognitive fatigue and boost engagement. It introduces Discussion Jockey 2, a transcripts-driven pipeline that uses Whisper for speech-to-text, GPT-4 to craft music prompts, and MusicGen to produce real-time music that loops for continuous playback over a meeting session. In a 14-participant online interview study, the system yielded higher reported relaxation and concentration, with generally positive reception but highlighted the need for personalization and faster real-time processing. The findings demonstrate the potential of context-aware musical augmentation to improve perceived ease and focus in virtual meetings and guide future enhancements for personalization and environmental adaptation.

Abstract

As online communication continues to expand, participants often face cognitive fatigue and reduced engagement. Cognitive augmentation, which leverages technology to enhance human abilities, offers promising solutions to these challenges. In this study, we investigate the potential of generative artificial intelligence (GenAI) for real-time music generation to enrich online meetings. We introduce Discussion Jockey 2, a system that dynamically produces background music in response to live conversation transcripts. Through a user study involving 14 participants in an online interview setting, we examine the system's impact on relaxation, concentration, and overall user experience. The findings reveal that AI-generated background music significantly enhances user relaxation (average score: 5.75/9) and concentration (average score: 5.86/9). This research underscores the promise of context-aware music generation in improving the quality of online communication and points to future directions for optimizing its implementation across various virtual environments.

Augmenting Online Meetings with Context-Aware Real-time Music Generation

TL;DR

This work investigates using GenAI to generate context-aware background music during online meetings to counter cognitive fatigue and boost engagement. It introduces Discussion Jockey 2, a transcripts-driven pipeline that uses Whisper for speech-to-text, GPT-4 to craft music prompts, and MusicGen to produce real-time music that loops for continuous playback over a meeting session. In a 14-participant online interview study, the system yielded higher reported relaxation and concentration, with generally positive reception but highlighted the need for personalization and faster real-time processing. The findings demonstrate the potential of context-aware musical augmentation to improve perceived ease and focus in virtual meetings and guide future enhancements for personalization and environmental adaptation.

Abstract

As online communication continues to expand, participants often face cognitive fatigue and reduced engagement. Cognitive augmentation, which leverages technology to enhance human abilities, offers promising solutions to these challenges. In this study, we investigate the potential of generative artificial intelligence (GenAI) for real-time music generation to enrich online meetings. We introduce Discussion Jockey 2, a system that dynamically produces background music in response to live conversation transcripts. Through a user study involving 14 participants in an online interview setting, we examine the system's impact on relaxation, concentration, and overall user experience. The findings reveal that AI-generated background music significantly enhances user relaxation (average score: 5.75/9) and concentration (average score: 5.86/9). This research underscores the promise of context-aware music generation in improving the quality of online communication and points to future directions for optimizing its implementation across various virtual environments.

Paper Structure

This paper contains 5 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Proposed architecture of Discussion Jockey 2. The application uses Whisper API to collect speech transcripts. The transcript and template prompt are input in GPT-4 to generate a music description prompt optimized for MusicGen API. The generated music is then feedback to the application and played in the participant interface.
  • Figure 2: Experiment Design: The music is generated using transcription from the first three minutes of the interview. While the music is played, the new three-minute transcription will used for generation. The procedure will be repeated until five different pieces of music are played.