Table of Contents
Fetching ...

Exploratory Study Of Human-AI Interaction For Hindustani Music

Nithya Shikarpur, Cheng-Zhi Anna Huang

TL;DR

This paper presents a study of participants interacting with and using GaMaDHaNi, a novel hierarchical generative model for Hindustani vocal contours, to better understand the expectations, reactions, and preferences of practicing musicians when engaging with such a model.

Abstract

This paper presents a study of participants interacting with and using GaMaDHaNi, a novel hierarchical generative model for Hindustani vocal contours. To explore possible use cases in human-AI interaction, we conducted a user study with three participants, each engaging with the model through three predefined interaction modes. Although this study was conducted "in the wild"- with the model unadapted for the shift from the training data to real-world interaction - we use it as a pilot to better understand the expectations, reactions, and preferences of practicing musicians when engaging with such a model. We note their challenges as (1) the lack of restrictions in model output, and (2) the incoherence of model output. We situate these challenges in the context of Hindustani music and aim to suggest future directions for the model design to address these gaps.

Exploratory Study Of Human-AI Interaction For Hindustani Music

TL;DR

This paper presents a study of participants interacting with and using GaMaDHaNi, a novel hierarchical generative model for Hindustani vocal contours, to better understand the expectations, reactions, and preferences of practicing musicians when engaging with such a model.

Abstract

This paper presents a study of participants interacting with and using GaMaDHaNi, a novel hierarchical generative model for Hindustani vocal contours. To explore possible use cases in human-AI interaction, we conducted a user study with three participants, each engaging with the model through three predefined interaction modes. Although this study was conducted "in the wild"- with the model unadapted for the shift from the training data to real-world interaction - we use it as a pilot to better understand the expectations, reactions, and preferences of practicing musicians when engaging with such a model. We note their challenges as (1) the lack of restrictions in model output, and (2) the incoherence of model output. We situate these challenges in the context of Hindustani music and aim to suggest future directions for the model design to address these gaps.

Paper Structure

This paper contains 12 sections, 3 equations, 2 figures.

Figures (2)

  • Figure 1: The hierarchical structure of the generative model, GaMaDHaNi comprising of a Pitch Generator, the Spectrogram Generator and a vocoder.
  • Figure 2: Extracted pitch contours of call and response examples where coherence was maintained (top) and not (bottom). Top: The model output maintained the scale that the input included. The scale notes used in the input are highlighted in horizontal grey boxes and a rough transcription performed by the author, a trained Hindustani musician, is provided as a black line to give a clear picture of the used notes. Bottom: An example of a model output that P1 found to have a 'different structure'. One can see that while the input is very simple, the output has much more movement. Additionally the use of notes outside the scale (highlighted in red) results in incoherence as well.