Table of Contents
Fetching ...

Modelling the Effects of Hearing Loss on Neural Coding in the Auditory Midbrain with Variational Conditioning

Lloyd Pellatt, Fotios Drakopoulos, Shievanie Sabesan, Nicholas A. Lesica

TL;DR

The paper tackles the challenge of modeling central auditory midbrain neural coding under hearing loss by learning a low-dimensional conditioning vector $\psi$ that encodes HI-induced distortions directly from multi-animal neural data. It introduces a variational-conditional extension of ICNet, with a time-degradation module and a KL-based regularization, and compares conditioning on learned $\psi$ versus ABR thresholds. The approach achieves substantial explainable variance capture and enables rapid adaptation to unseen animals via Bayesian optimization over $\psi$, with performance improving as more animals are included in training. This work paves the way for parametric, data-driven hearing-loss compensation approaches that target neural codes in the midbrain and could be quickly tuned to new listeners through human-in-the-loop optimization.

Abstract

The mapping from sound to neural activity that underlies hearing is highly non-linear. The first few stages of this mapping in the cochlea have been modelled successfully, with biophysical models built by hand and, more recently, with DNN models trained on datasets simulated by biophysical models. Modelling the auditory brain has been a challenge because central auditory processing is too complex for models to be built by hand, and datasets for training DNN models directly have not been available. Recent work has taken advantage of large-scale high resolution neural recordings from the auditory midbrain to build a DNN model of normal hearing with great success. But this model assumes that auditory processing is the same in all brains, and therefore it cannot capture the widely varying effects of hearing loss. We propose a novel variational-conditional model to learn to encode the space of hearing loss directly from recordings of neural activity in the auditory midbrain of healthy and noise exposed animals. With hearing loss parametrised by only 6 free parameters per animal, our model accurately predicts 62% of the explainable variance in neural responses from normal hearing animals and 68% for hearing impaired animals, within a few percentage points of state of the art animal specific models. We demonstrate that the model can be used to simulate realistic activity from out of sample animals by fitting only the learned conditioning parameters with Bayesian optimisation, achieving crossentropy loss within 2% of the optimum in 15-30 iterations. Including more animals in the training data slightly improved the performance on unseen animals. This model will enable future development of parametrised hearing loss compensation models trained to directly restore normal neural coding in hearing impaired brains, which can be quickly fitted for a new user by human in the loop optimisation.

Modelling the Effects of Hearing Loss on Neural Coding in the Auditory Midbrain with Variational Conditioning

TL;DR

The paper tackles the challenge of modeling central auditory midbrain neural coding under hearing loss by learning a low-dimensional conditioning vector that encodes HI-induced distortions directly from multi-animal neural data. It introduces a variational-conditional extension of ICNet, with a time-degradation module and a KL-based regularization, and compares conditioning on learned versus ABR thresholds. The approach achieves substantial explainable variance capture and enables rapid adaptation to unseen animals via Bayesian optimization over , with performance improving as more animals are included in training. This work paves the way for parametric, data-driven hearing-loss compensation approaches that target neural codes in the midbrain and could be quickly tuned to new listeners through human-in-the-loop optimization.

Abstract

The mapping from sound to neural activity that underlies hearing is highly non-linear. The first few stages of this mapping in the cochlea have been modelled successfully, with biophysical models built by hand and, more recently, with DNN models trained on datasets simulated by biophysical models. Modelling the auditory brain has been a challenge because central auditory processing is too complex for models to be built by hand, and datasets for training DNN models directly have not been available. Recent work has taken advantage of large-scale high resolution neural recordings from the auditory midbrain to build a DNN model of normal hearing with great success. But this model assumes that auditory processing is the same in all brains, and therefore it cannot capture the widely varying effects of hearing loss. We propose a novel variational-conditional model to learn to encode the space of hearing loss directly from recordings of neural activity in the auditory midbrain of healthy and noise exposed animals. With hearing loss parametrised by only 6 free parameters per animal, our model accurately predicts 62% of the explainable variance in neural responses from normal hearing animals and 68% for hearing impaired animals, within a few percentage points of state of the art animal specific models. We demonstrate that the model can be used to simulate realistic activity from out of sample animals by fitting only the learned conditioning parameters with Bayesian optimisation, achieving crossentropy loss within 2% of the optimum in 15-30 iterations. Including more animals in the training data slightly improved the performance on unseen animals. This model will enable future development of parametrised hearing loss compensation models trained to directly restore normal neural coding in hearing impaired brains, which can be quickly fitted for a new user by human in the loop optimisation.

Paper Structure

This paper contains 18 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Architecture of $\psi$-ICNet. An input sound is processed by a convolutional encoder, consisting of a SincNet layer with 48 filters followed by 5 causal convolutional layers with 246 kernels of size 60 and a final layer with 64 kernels. The resulting representation $\hat{r}_b$ encodes latent dynamics shared across animals. The variational conditioning module introduces animal-specific information in the form of a set of values $\psi$ as described in figure \ref{['fig:time_degradation_module']}. A three layer convolutional network with $N_\psi$, $2N_\psi$ and $3N_\psi$ kernels (where $N_\psi$ is the number of $\psi$ parameters) expands $\psi$ into a feature map $\hat{r}_{\psi}$ of size $320 \times 3N_{\psi}$, which is concatenated with $\hat{r}_b$. A further 4 convolutional layers containing progressively fewer kernels of size 16 then combine the shared and animal-specific features, reducing the dimensionality back to $320 \times 64$. This produces an individualised representation $\hat{r}_b'$, which is cropped to remove context. The decoder is shared between animals and outputs a categorical probability distribution $p(\hat{R}|s, \psi)$ over the number of spikes at each channel and timestep, from which we sample to predict $\hat{R}$.
  • Figure 2: Architecture of the variational conditioning module which encodes the effects of hearing loss on the generic latent representation. $\mu_{\psi}^t$ is a set of learned weights which form the means of a multivariate Gaussian distribution from which we sample (using the reparameterisation trick) to produce values which modify the bottleneck. The means are modified by the time transfer function which accounts for the change over time of the state of the animal during recording. A sigmoid function scales the modified weights between zero and one, and we sample from a normal distribution with learned covariance matrix $\Sigma_{\psi}$, which is diagonal.
  • Figure 3: Comparison of $\psi$-ICNet to single branch ICNet. a and b show examples of real and predicted MUA in response to clean speech and music respectively. The left column shows real MUA recorded from three animals — one NH, one mild HI and one severe HI. The centre and right columns show MUA predicted by the single branch models and $\psi$-ICNet respectively.
  • Figure 4: a and b show the number of iterations required to converge to the target loss when starting with 4, 8, or 16 random samples. c to f show the loss surface over a 2D projection of 1000 evenly spaced points in the $\psi$ space for 4 animals — two NH and two HI, two in-sample and two out of sample — for the 3 parameter $\psi$-ICNet. The colour of each point represents the loss obtained by generating MUA from the trained model with $\psi$ fixed to the given values. The black crosses represent the values of $\psi^*$ found by Bayesian optimisation over repeated runs, and the red star shows the $\psi$ value learned during training (or the best point found in a grid search for unseen animals). g to j show the same plots for the 6 parameter model, for which 4096 points in the $\psi$ space were evaluated in the grid search. k shows real (left) and simulated (right) MUA in response to a segment of speech for a model trained on 20 animals, 10 NH and 10 HI, when fitted to 4 different unseen animals. The hearing status of the animal is indicated on the right of the row. l and m show the relative FEVE of the 9 animal vs the 20 animal model for in-sample and out of sample animals, with NH animals highlighted in orange and HI animals in blue. n and o show the relative KL divergence. Different markers represent different sound classes. Key: R = ripples, SN = speech in noise, S = clean speech, M = music, I = single instruments.