Modality-Dependent Memory Mechanisms in Cross-Modal Neuromorphic Computing
Effiong Blessing, Chiung-Yi Tseng, Somshubhra Roy, Junaid Rehman, Isaac Nkrumah
TL;DR
This work investigates whether memory mechanisms in memory-augmented spiking neural networks generalize across visual and auditory modalities. It conducts a systematic cross-modal ablation of Hopfield networks, Hierarchical Gated Recurrent Networks, and supervised contrastive learning on visual N-MNIST and auditory SHD, comparing parallel and joint training. Key findings show modality-dependent preferences (Hopfield strong for visual tasks, SCL balanced, HGRN robust across modalities), and that a unified HGRN can match the performance of parallel models while enabling single-model deployment; engram analyses reveal modality-specific representations with weak cross-modal alignment. Across all architectures, the approach achieves substantial energy efficiency, exceeding 600x reductions relative to conventional neural networks, demonstrating the viability of multi-sensory neuromorphic systems.
Abstract
Memory-augmented spiking neural networks (SNNs) promise energy-efficient neuromorphic computing, yet their generalization across sensory modalities remains unexplored. We present the first comprehensive cross-modal ablation study of memory mechanisms in SNNs, evaluating Hopfield networks, Hierarchical Gated Recurrent Networks (HGRNs), and supervised contrastive learning (SCL) across visual (N-MNIST) and auditory (SHD) neuromorphic datasets. Our systematic evaluation of five architectures reveals striking modality-dependent performance patterns: Hopfield networks achieve 97.68% accuracy on visual tasks but only 76.15% on auditory tasks (21.53 point gap), revealing severe modality-specific specialization, while SCL demonstrates more balanced cross-modal performance (96.72% visual, 82.16% audio, 14.56 point gap). These findings establish that memory mechanisms exhibit task-specific benefits rather than universal applicability. Joint multi-modal training with HGRN achieves 94.41% visual and 79.37% audio accuracy (88.78% average), matching parallel HGRN performance through unified deployment. Quantitative engram analysis confirms weak cross-modal alignment (0.038 similarity), validating our parallel architecture design. Our work provides the first empirical evidence for modality-specific memory optimization in neuromorphic systems, achieving 603x energy efficiency over traditional neural networks.
