A General Close-loop Predictive Coding Framework for Auditory Working Memory
Zhongju Yuan, Geraint Wiggins, Dick Botteldooren
TL;DR
The paper addresses the lack of neural-network models for auditory working memory by introducing a general close-loop predictive coding framework. A two-layer network with learnable memory in the weights writes sequence information during a write phase and recalls it in a read phase using fixed weights, with a close-loop feedback mechanism enhancing recall. The approach is evaluated on two diverse datasets, ESC-50 and LibriSpeech, using 200 ms segments and textual semantic similarity (via CLAP captions and Whisper transcripts) to measure recall fidelity, with results showing semantic similarity scores consistently above 0.7. These findings suggest the framework can robustly preserve meaningful auditory representations across environmental sounds and speech, highlighting a biologically inspired path for memory formation and retrieval in neural systems.
Abstract
Auditory working memory is essential for various daily activities, such as language acquisition, conversation. It involves the temporary storage and manipulation of information that is no longer present in the environment. While extensively studied in neuroscience and cognitive science, research on its modeling within neural networks remains limited. To address this gap, we propose a general framework based on a close-loop predictive coding paradigm to perform short auditory signal memory tasks. The framework is evaluated on two widely used benchmark datasets for environmental sound and speech, demonstrating high semantic similarity across both datasets.
