Table of Contents
Fetching ...

A Reservoir-based Model for Human-like Perception of Complex Rhythm Pattern

Zhongju Yuan, Geraint Wiggins, Dick Botteldooren

TL;DR

To address how humans perceive and predict complex rhythms, the paper proposes a biologically inspired four-layer reservoir that implements predictive coding across Tatums, Tactus, Higher Cognition, and Motor layers. The core dynamics follow $h_{t+1}=(1-α)h_t+α f(W_{in} x_t+W h_t+ξ_t)$ with outputs $y_{hat}_t = W_{out} h_t$, enabling predictions ahead of the input rhythm. The Tatum Layer extracts the smallest time unit, the Tactus Layer encodes meter, and the Higher Cognition layer stores long-term rhythm memory to modulate Motor outputs, including damping when missing beats are predicted (with a forward-shifted target of 200 ms). Results show high synchronization accuracy across diverse rhythms and tempos, and beta-band-like motor activity aligns with human EEG studies, supporting the model's biological plausibility and utility for understanding rhythmic cognition.

Abstract

Rhythm is a fundamental aspect of human behaviour, present from infancy and deeply embedded in cultural practices. Rhythm anticipation is a spontaneous cognitive process that typically occurs before the onset of actual beats. While most research in both neuroscience and artificial intelligence has focused on metronome-based rhythm tasks, studies investigating the perception of complex musical rhythm patterns remain limited. To address this gap, we propose a hierarchical oscillator-based model to better understand the perception of complex musical rhythms in biological systems. The model consists of two types of coupled neurons that generate oscillations, with different layers tuned to respond to distinct perception levels. We evaluate the model using several representative rhythm patterns spanning the upper, middle, and lower bounds of human musical perception. Our findings demonstrate that, while maintaining a high degree of synchronization accuracy, the model exhibits human-like rhythmic behaviours. Additionally, the beta band neuronal activity in the model mirrors patterns observed in the human brain, further validating the biological plausibility of the approach.

A Reservoir-based Model for Human-like Perception of Complex Rhythm Pattern

TL;DR

To address how humans perceive and predict complex rhythms, the paper proposes a biologically inspired four-layer reservoir that implements predictive coding across Tatums, Tactus, Higher Cognition, and Motor layers. The core dynamics follow with outputs , enabling predictions ahead of the input rhythm. The Tatum Layer extracts the smallest time unit, the Tactus Layer encodes meter, and the Higher Cognition layer stores long-term rhythm memory to modulate Motor outputs, including damping when missing beats are predicted (with a forward-shifted target of 200 ms). Results show high synchronization accuracy across diverse rhythms and tempos, and beta-band-like motor activity aligns with human EEG studies, supporting the model's biological plausibility and utility for understanding rhythmic cognition.

Abstract

Rhythm is a fundamental aspect of human behaviour, present from infancy and deeply embedded in cultural practices. Rhythm anticipation is a spontaneous cognitive process that typically occurs before the onset of actual beats. While most research in both neuroscience and artificial intelligence has focused on metronome-based rhythm tasks, studies investigating the perception of complex musical rhythm patterns remain limited. To address this gap, we propose a hierarchical oscillator-based model to better understand the perception of complex musical rhythms in biological systems. The model consists of two types of coupled neurons that generate oscillations, with different layers tuned to respond to distinct perception levels. We evaluate the model using several representative rhythm patterns spanning the upper, middle, and lower bounds of human musical perception. Our findings demonstrate that, while maintaining a high degree of synchronization accuracy, the model exhibits human-like rhythmic behaviours. Additionally, the beta band neuronal activity in the model mirrors patterns observed in the human brain, further validating the biological plausibility of the approach.

Paper Structure

This paper contains 7 sections, 9 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Schematic representation of rhythm structure and the architecture of the proposed model for complex rhythm pattern prediction. (A) Definitions used in the rhythm illustration. All complex patterns are derived from an isochronous (ISO) rhythm, characterized by equal inter-beat intervals (IBI), with the target IBI serving as the smallest time unit in the complex pattern. The complex pattern is a metric pattern, where each "on-beat" aligns with a beat in the ISO. Predictions are made for beats occurring 200 ms before the "on beat" in the metric pattern. (B) Hierarchical musical perception structure, as described by Vuust et al. vuust2014rhythmic, is illustrated. Each metric level is recursively subdivided into equally spaced sub-units at the next lower level, defining the metric salience of positions within the rhythmic framework. The tactus lies at the midpoint of this structure. (C) The architecture of the proposed model, showing the flow of information within the system. The model consists of four distinct layers, each with specialized functions, with the Motor Layer responsible for generating the predictions.
  • Figure 2: Comparison of Tatum Inter-Beat Interval (IBI) Distributions Across Different Models After Pretraining. All models were pre-trained on the same single-channel rhythmic dataset, and IBIs were computed for each rhythm pattern at three distinct frequencies. The horizontal dashed lines indicate the corresponding tatum IBIs.
  • Figure 3: Comparison of Tatum Inter-Beat Interval Distributions Before and After Zero-shot Learning of Our Model. After a brief input adaptation phase at the start, the output weight matrix is fixed. Following this adaptation, the IBIs of each pattern are calculated and compared to their pre-adaptation values across three distinct frequencies. The violin plots highlight two prominent peaks, with horizontal dashed lines indicating the corresponding tatum IBIs.
  • Figure 4: Comparison of Mean IBIs from the Tatum Layer Output Across Different Models and Pre- vs. Post-Zero-Shot Learning for Our Model Across Various Patterns and Frequencies. Panels (A, B, C) display the mean IBIs across different frequencies, respectively. For each frequency, the mean output IBIs of all patterns are measured for three models, as well as for our model before and after zero-shot learning. The mean values for each model are represented by distinct colored lines in the corresponding radar plot, while the tatum IBIs and their multiples are depicted as black dashed hollow circles in each panel.
  • Figure 5: Comparison of Tactus and Motor Layer Outputs Across Patterns and Frequencies. Panels (A, B, C) display the measurements for the Tactus Layer outputs, while Panels (D, E, F) present the corresponding measurements for the Motor Layer outputs. Synchronization Strength (range: 0–1) is shown in Panels (A, D). Mean Asynchrony (unit: ms) is illustrated in Panels (B, E). Inter-Beat Deviation (IBD, range: 0–1) is depicted in Panels (C, F).
  • ...and 4 more figures