Table of Contents
Fetching ...

Input-Envelope-Output: Auditable Generative Music Rewards in Sensory-Sensitive Contexts

Cong Ye, Songlin Shang, Xiaoxu Ma, Xiangbo Zhang

TL;DR

A constraint-first Input-Envelope-Output (I-E-O) framework that makes safety explicit and verifiable while preserving action-output causality, and derives four verifiable design principles and instantiate them in MusiBubbles, a web-based prototype.

Abstract

Generative feedback in sensory-sensitive contexts poses a core design challenge: large individual differences in sensory tolerance make it difficult to sustain engagement without compromising safety. This tension is exemplified in autism spectrum disorder (ASD), where auditory sensitivities are common yet highly heterogeneous. Existing interactive music systems typically encode safety implicitly within direct input-output (I-O) mappings, which can preserve novelty but make system behavior hard to predict or audit. We instead propose a constraint-first Input-Envelope-Output (I-E-O) framework that makes safety explicit and verifiable while preserving action-output causality. I-E-O introduces a low-risk envelope layer between user input and audio output to specify safe bounds, enforce them deterministically, and log interventions for audit. From this architecture, we derive four verifiable design principles and instantiate them in MusiBubbles, a web-based prototype. Contributions include the I-E-O architecture, MusiBubbles as an exemplar implementation, and a reproducibility package to support adoption in ASD and other sensory-sensitive domains.

Input-Envelope-Output: Auditable Generative Music Rewards in Sensory-Sensitive Contexts

TL;DR

A constraint-first Input-Envelope-Output (I-E-O) framework that makes safety explicit and verifiable while preserving action-output causality, and derives four verifiable design principles and instantiate them in MusiBubbles, a web-based prototype.

Abstract

Generative feedback in sensory-sensitive contexts poses a core design challenge: large individual differences in sensory tolerance make it difficult to sustain engagement without compromising safety. This tension is exemplified in autism spectrum disorder (ASD), where auditory sensitivities are common yet highly heterogeneous. Existing interactive music systems typically encode safety implicitly within direct input-output (I-O) mappings, which can preserve novelty but make system behavior hard to predict or audit. We instead propose a constraint-first Input-Envelope-Output (I-E-O) framework that makes safety explicit and verifiable while preserving action-output causality. I-E-O introduces a low-risk envelope layer between user input and audio output to specify safe bounds, enforce them deterministically, and log interventions for audit. From this architecture, we derive four verifiable design principles and instantiate them in MusiBubbles, a web-based prototype. Contributions include the I-E-O architecture, MusiBubbles as an exemplar implementation, and a reproducibility package to support adoption in ASD and other sensory-sensitive domains.
Paper Structure (15 sections, 3 figures, 3 tables)

This paper contains 15 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Design framework contrasting Input–Output to Input-Envelope-Output Approaches. (A) Theory Framework: design space leading to DR1–DR3. (B) Abstraction Layer: baseline direct mapping (top) vs our constraint-first pipeline (bottom) with explicit low-risk envelope, compliance gate, and auditable session report.
  • Figure 2: MusiBubbles reference implementation. (A) Expert interface: Behavior Analysis panel shows click trail and pattern recognition results; Music Parameters panel provides configurable Low-risk Range bounds (orange regions indicate out-of-bound zones) under Default configuration. (B) Enforcement evidence (under default) : (i) spectrogram comparison between baseline (a) and constrained (b) outputs showing differences in onset density, with corresponding loudness contours (LUFS) and range summaries (LRA) (c, d); spectrograms share the same color scale; (ii) summary table showing L2 parameters with baseline vs constrained values and clamp status, plus resulting L1 signal metrics. This single-trace example serves as diagnostic evidence; aggregate statistics (N=660) are reported in Figure \ref{['fig:tuning']}.
  • Figure 3: Envelope enforcement under Default configuration (N=660 traces). Top row (a–c): scatter plots comparing baseline values (x-axis) with post-enforcement values (y-axis) for tempo, gain, and accent ratio; dashed line indicates y=x, dotted lines denote lower/upper bounds; clamp rates are 91.5%, 18.0%, and 85.6%, respectively (gain shows lower clamp rate as user behavior rarely produces extreme values; accent ratio lower-bound clamping is rare as users seldom generate very low accent ratios). Bottom row (d–f): resulting distribution shifts in onset density, integrated loudness (LUFS), and loudness range (LRA); vertical dashed line indicates zero change. Additional configurations (Tight, Relaxed) are provided in supplementary materials.