Table of Contents
Fetching ...

BAN: Neuroanatomical Aligning in Auditory Recognition between Artificial Neural Network and Human Cortex

Haidong Wang, Pengfei Xiao, Ao Liu, Jianhua Zhang, Qia Shan

TL;DR

This work introduces BAN, a shallow, recurrent network that mirrors the human auditory cortex by mapping four cortical areas (A1, Belt, PB, T2/T3) and uses a brain-like auditory score (BAS) to gauge alignment with cortical activations and human genre choices. BAN achieves strong music genre recognition while maintaining high neuroanatomical similarity, demonstrated through fMRI-based cortical predictions and behavioral agreement. The BAS combines cortical and behavioral metrics into a single score, providing a principled framework to evaluate brain-like AI models. This approach advances interpretability and aligns artificial auditory processing with neuroanatomy, with potential implications for brain-computer interfaces and neuroscience-informed AI design.

Abstract

Drawing inspiration from neurosciences, artificial neural networks (ANNs) have evolved from shallow architectures to highly complex, deep structures, yielding exceptional performance in auditory recognition tasks. However, traditional ANNs often struggle to align with brain regions due to their excessive depth and lack of biologically realistic features, like recurrent connection. To address this, a brain-like auditory network (BAN) is introduced, which incorporates four neuroanatomically mapped areas and recurrent connection, guided by a novel metric called the brain-like auditory score (BAS). BAS serves as a benchmark for evaluating the similarity between BAN and human auditory recognition pathway. We further propose that specific areas in the cerebral cortex, mainly the middle and medial superior temporal (T2/T3) areas, correspond to the designed network structure, drawing parallels with the brain's auditory perception pathway. Our findings suggest that the neuroanatomical similarity in the cortex and auditory classification abilities of the ANN are well-aligned. In addition to delivering excellent performance on a music genre classification task, the BAN demonstrates a high BAS score. In conclusion, this study presents BAN as a recurrent, brain-inspired ANN, representing the first model that mirrors the cortical pathway of auditory recognition.

BAN: Neuroanatomical Aligning in Auditory Recognition between Artificial Neural Network and Human Cortex

TL;DR

This work introduces BAN, a shallow, recurrent network that mirrors the human auditory cortex by mapping four cortical areas (A1, Belt, PB, T2/T3) and uses a brain-like auditory score (BAS) to gauge alignment with cortical activations and human genre choices. BAN achieves strong music genre recognition while maintaining high neuroanatomical similarity, demonstrated through fMRI-based cortical predictions and behavioral agreement. The BAS combines cortical and behavioral metrics into a single score, providing a principled framework to evaluate brain-like AI models. This approach advances interpretability and aligns artificial auditory processing with neuroanatomy, with potential implications for brain-computer interfaces and neuroscience-informed AI design.

Abstract

Drawing inspiration from neurosciences, artificial neural networks (ANNs) have evolved from shallow architectures to highly complex, deep structures, yielding exceptional performance in auditory recognition tasks. However, traditional ANNs often struggle to align with brain regions due to their excessive depth and lack of biologically realistic features, like recurrent connection. To address this, a brain-like auditory network (BAN) is introduced, which incorporates four neuroanatomically mapped areas and recurrent connection, guided by a novel metric called the brain-like auditory score (BAS). BAS serves as a benchmark for evaluating the similarity between BAN and human auditory recognition pathway. We further propose that specific areas in the cerebral cortex, mainly the middle and medial superior temporal (T2/T3) areas, correspond to the designed network structure, drawing parallels with the brain's auditory perception pathway. Our findings suggest that the neuroanatomical similarity in the cortex and auditory classification abilities of the ANN are well-aligned. In addition to delivering excellent performance on a music genre classification task, the BAN demonstrates a high BAS score. In conclusion, this study presents BAN as a recurrent, brain-inspired ANN, representing the first model that mirrors the cortical pathway of auditory recognition.

Paper Structure

This paper contains 29 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Collaborative model between ANNs and neuroanatomy with brain-like auditory score (BAS). Using the quantitative BAS, we draw inspiration from brain and apply this idea to inform the model of BAN. BAN consists of four regions mapped to the primary auditory cortex (A1), the belt or peripheral area (Belt), the parabelt cortex (PB), and the middle and superior temporal cortex areas (T2/T3). The CONVV1 layer is a traditional convolutional layer responsible for preprocessing and data size reduction. The RNNBelt and RNNPB and RNNT2/T3 reference recurrent neural networks for modelingpark2020circuit, as detailed in Sec. \ref{['sec:cornet_s_def']}. The upper-right image illustrates the auditory recognition activations and predicted genre labels in the brain-aligned neural network, while the lower-right figure shows human activations and choices for the same stimulus on the far left. This comparison highlights the relationship between BAN's prediction performance and brain auditory recognition responses, as explained in Sec.\ref{['sec:experiments']}.
  • Figure 2: The BAN circuitry is designed based on the human auditory cortex pathway. Key cortical areas involved in auditory recognition are highlighted with orange modules. Auditory neural signals are generated from the input music $F_t$ via the cochlear hair cells. In the ventral pathway, A1 processes auditory representations $v_t$, and neurons in the ventral "what" pathway (including A1, Belt, PB, and T2/T3) extract auditory features from the preprocessed neural signal $m_t$. The Belt and PB are particularly important for refining features based on the temporal and rate codes $v_t$, and the output $p_t$ from PB is passed to the next module. The Belt, PB, and T2/T3 regions are recurrent. Finally, fully connected layers serve as the coder to generate auditory labels. Solid arrows show feature flow within a single time step, while dashed black arrows indicate temporal connections.
  • Figure 3: BAN circuitry analysis. Each row shows how the highest accuracy on the GTZAN dataset and the BAS score change relative to the baseline model when a specific hyperparameter is modified, including cortical score $s_r$ and behavioral score $s_b$.
  • Figure 4: Classification precision of the proposed BAN on Music Genres dataset. For various audio clips, the inferred classification outputs are determined with BAN.
  • Figure 5: Visualize predictions. View a sample of the input data along with the true and predicted class labels. The $x$-axis represents time, the $y$-axis represents frequency, and the colormap indicates decibel levels. For several classes, distinct features are clearly visible. For instance, the spectrogram for the country music class displays simple melodies and steady rhythms over time, characteristic of country music. Additionally, it highlights the low-frequency sounds produced by musical instruments typical of this genre.
  • ...and 3 more figures