Table of Contents
Fetching ...

Modality-Dependent Memory Mechanisms in Cross-Modal Neuromorphic Computing

Effiong Blessing, Chiung-Yi Tseng, Somshubhra Roy, Junaid Rehman, Isaac Nkrumah

TL;DR

This work investigates whether memory mechanisms in memory-augmented spiking neural networks generalize across visual and auditory modalities. It conducts a systematic cross-modal ablation of Hopfield networks, Hierarchical Gated Recurrent Networks, and supervised contrastive learning on visual N-MNIST and auditory SHD, comparing parallel and joint training. Key findings show modality-dependent preferences (Hopfield strong for visual tasks, SCL balanced, HGRN robust across modalities), and that a unified HGRN can match the performance of parallel models while enabling single-model deployment; engram analyses reveal modality-specific representations with weak cross-modal alignment. Across all architectures, the approach achieves substantial energy efficiency, exceeding 600x reductions relative to conventional neural networks, demonstrating the viability of multi-sensory neuromorphic systems.

Abstract

Memory-augmented spiking neural networks (SNNs) promise energy-efficient neuromorphic computing, yet their generalization across sensory modalities remains unexplored. We present the first comprehensive cross-modal ablation study of memory mechanisms in SNNs, evaluating Hopfield networks, Hierarchical Gated Recurrent Networks (HGRNs), and supervised contrastive learning (SCL) across visual (N-MNIST) and auditory (SHD) neuromorphic datasets. Our systematic evaluation of five architectures reveals striking modality-dependent performance patterns: Hopfield networks achieve 97.68% accuracy on visual tasks but only 76.15% on auditory tasks (21.53 point gap), revealing severe modality-specific specialization, while SCL demonstrates more balanced cross-modal performance (96.72% visual, 82.16% audio, 14.56 point gap). These findings establish that memory mechanisms exhibit task-specific benefits rather than universal applicability. Joint multi-modal training with HGRN achieves 94.41% visual and 79.37% audio accuracy (88.78% average), matching parallel HGRN performance through unified deployment. Quantitative engram analysis confirms weak cross-modal alignment (0.038 similarity), validating our parallel architecture design. Our work provides the first empirical evidence for modality-specific memory optimization in neuromorphic systems, achieving 603x energy efficiency over traditional neural networks.

Modality-Dependent Memory Mechanisms in Cross-Modal Neuromorphic Computing

TL;DR

This work investigates whether memory mechanisms in memory-augmented spiking neural networks generalize across visual and auditory modalities. It conducts a systematic cross-modal ablation of Hopfield networks, Hierarchical Gated Recurrent Networks, and supervised contrastive learning on visual N-MNIST and auditory SHD, comparing parallel and joint training. Key findings show modality-dependent preferences (Hopfield strong for visual tasks, SCL balanced, HGRN robust across modalities), and that a unified HGRN can match the performance of parallel models while enabling single-model deployment; engram analyses reveal modality-specific representations with weak cross-modal alignment. Across all architectures, the approach achieves substantial energy efficiency, exceeding 600x reductions relative to conventional neural networks, demonstrating the viability of multi-sensory neuromorphic systems.

Abstract

Memory-augmented spiking neural networks (SNNs) promise energy-efficient neuromorphic computing, yet their generalization across sensory modalities remains unexplored. We present the first comprehensive cross-modal ablation study of memory mechanisms in SNNs, evaluating Hopfield networks, Hierarchical Gated Recurrent Networks (HGRNs), and supervised contrastive learning (SCL) across visual (N-MNIST) and auditory (SHD) neuromorphic datasets. Our systematic evaluation of five architectures reveals striking modality-dependent performance patterns: Hopfield networks achieve 97.68% accuracy on visual tasks but only 76.15% on auditory tasks (21.53 point gap), revealing severe modality-specific specialization, while SCL demonstrates more balanced cross-modal performance (96.72% visual, 82.16% audio, 14.56 point gap). These findings establish that memory mechanisms exhibit task-specific benefits rather than universal applicability. Joint multi-modal training with HGRN achieves 94.41% visual and 79.37% audio accuracy (88.78% average), matching parallel HGRN performance through unified deployment. Quantitative engram analysis confirms weak cross-modal alignment (0.038 similarity), validating our parallel architecture design. Our work provides the first empirical evidence for modality-specific memory optimization in neuromorphic systems, achieving 603x energy efficiency over traditional neural networks.

Paper Structure

This paper contains 24 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Joint Architecture. Unified multi-modal neuromorphic system with modality-specific encoders, shared HGRN processing, and alternating batch training enabling competitive cross-modal performance (88.78% average) through single-model deployment.
  • Figure 2: Cross-Modal Performance Patterns. Modality-dependent architectural preferences: Hopfield networks excel on visual tasks (97.68%) but perform poorly on auditory tasks (76.15%), a 21.53 percentage point gap. SCL achieves best average cross-modal performance (89.44%). HGRN provides consistent performance (97.48% visual, 80.08% audio). Features extracted via rate encoding (mean firing rate over time).
  • Figure 3: Cross-Modal Engram Analysis. (Top row) t-SNE visualizations show Model 4 achieves exceptional visual engram formation (silhouette 0.871, left) while auditory engrams show moderate quality (0.216, right). (Bottom row) Cross-modal alignment matrices reveal near-zero similarity (0.038 for M2, left; -0.004 for M4, right), confirming modality-specific learning. Features extracted via rate encoding from balanced class sampling (100 samples per class).