Table of Contents
Fetching ...

SteganoSNN: SNN-Based Audio-in-Image Steganography with Encryption

Biswajit Kumar Sahoo, Pedro Machado, Isibor Kennedy Ihianle, Andreas Oikonomou, Srinivas Boppu

TL;DR

SteganoSNN introduces a neuromorphic approach to audio-in-image steganography by encoding audio as spike trains via LIF neurons, encrypting using a modulo-based scheme, and embedding into RGBA images with dithering; implemented on NEST and a PYNQ-Z2 FPGA to achieve $8$ bpp at high perceptual fidelity. It demonstrates higher payload and lower computational overhead than SteganoGAN while maintaining robustness to steganalysis, with PSNR in the $40.4$–$41.35$ dB range and SSIM above $0.97$, and full audio recovery. The work validates a practical, hardware-accelerated path for secure edge-friendly multimedia hiding, with potential applications in Edge AI, IoT, and biomedical data transmission. It also lays groundwork for future neuromorphic extensions to multimodal embedding and on-chip learning against adversarial steganalysis.

Abstract

Secure data hiding remains a fundamental challenge in digital communication, requiring a careful balance between computational efficiency and perceptual transparency. The balance between security and performance is increasingly fragile with the emergence of generative AI systems capable of autonomously generating and optimising sophisticated cryptanalysis and steganalysis algorithms, thereby accelerating the exposure of vulnerabilities in conventional data-hiding schemes. This work introduces SteganoSNN, a neuromorphic steganographic framework that exploits spiking neural networks (SNNs) to achieve secure, low-power, and high-capacity multimedia data hiding. Digitised audio samples are converted into spike trains using leaky integrate-and-fire (LIF) neurons, encrypted via a modulo-based mapping scheme, and embedded into the least significant bits of RGBA image channels using a dithering mechanism to minimise perceptual distortion. Implemented in Python using NEST and realised on a PYNQ-Z2 FPGA, SteganoSNN attains real-time operation with an embedding capacity of 8 bits per pixel. Experimental evaluations on the DIV2K 2017 dataset demonstrate image fidelity between 40.4 dB and 41.35 dB in PSNR and SSIM values consistently above 0.97, surpassing SteganoGAN in computational efficiency and robustness. SteganoSNN establishes a foundation for neuromorphic steganography, enabling secure, energy-efficient communication for Edge-AI, IoT, and biomedical applications.

SteganoSNN: SNN-Based Audio-in-Image Steganography with Encryption

TL;DR

SteganoSNN introduces a neuromorphic approach to audio-in-image steganography by encoding audio as spike trains via LIF neurons, encrypting using a modulo-based scheme, and embedding into RGBA images with dithering; implemented on NEST and a PYNQ-Z2 FPGA to achieve bpp at high perceptual fidelity. It demonstrates higher payload and lower computational overhead than SteganoGAN while maintaining robustness to steganalysis, with PSNR in the dB range and SSIM above , and full audio recovery. The work validates a practical, hardware-accelerated path for secure edge-friendly multimedia hiding, with potential applications in Edge AI, IoT, and biomedical data transmission. It also lays groundwork for future neuromorphic extensions to multimodal embedding and on-chip learning against adversarial steganalysis.

Abstract

Secure data hiding remains a fundamental challenge in digital communication, requiring a careful balance between computational efficiency and perceptual transparency. The balance between security and performance is increasingly fragile with the emergence of generative AI systems capable of autonomously generating and optimising sophisticated cryptanalysis and steganalysis algorithms, thereby accelerating the exposure of vulnerabilities in conventional data-hiding schemes. This work introduces SteganoSNN, a neuromorphic steganographic framework that exploits spiking neural networks (SNNs) to achieve secure, low-power, and high-capacity multimedia data hiding. Digitised audio samples are converted into spike trains using leaky integrate-and-fire (LIF) neurons, encrypted via a modulo-based mapping scheme, and embedded into the least significant bits of RGBA image channels using a dithering mechanism to minimise perceptual distortion. Implemented in Python using NEST and realised on a PYNQ-Z2 FPGA, SteganoSNN attains real-time operation with an embedding capacity of 8 bits per pixel. Experimental evaluations on the DIV2K 2017 dataset demonstrate image fidelity between 40.4 dB and 41.35 dB in PSNR and SSIM values consistently above 0.97, surpassing SteganoGAN in computational efficiency and robustness. SteganoSNN establishes a foundation for neuromorphic steganography, enabling secure, energy-efficient communication for Edge-AI, IoT, and biomedical applications.

Paper Structure

This paper contains 19 sections, 2 equations, 5 figures, 6 tables, 3 algorithms.

Figures (5)

  • Figure 1: Flowchart of the proposed SteganoSNN framework (portrait layout). The pipeline proceeds from audio digitisation and spike-based encoding (LIF neuron, NEST) through modulo-16 mapping and key assignment, followed by steganographic embedding into RGBA images. A PYNQ-Z2 FPGA performs on-chip encryption/decryption. Brace annotations group the Spike-to-Key Encoding and Steganographic Embedding stages.
  • Figure 2: Error bar plot showing minimum, mean, and maximum values for PSNR_RGB, SSIM_RGB, PSNR_RGBA, and SSIM_RGBA metrics across DIV2K 2017 dataset. PSNR values are shown **relative to 41** (PSNR - 41) to highlight small variations; SSIM values are plotted as original.
  • Figure 3: Combined error bar plot showing minimum, mean, and maximum values of the SPA, Triples, and WS metrics for the Green (G) channel across DIV2K 2017 dataset. Each curve connects mean values, while the vertical bars represent the range (minimum to maximum) observed per dataset. SPA (green) quantifies spatial pixel distortion, Triples (blue) measures triple-pixel correlation, and WS (red) captures wavelet-domain similarity. The Green channel is emphasised for its perceptual dominance in human vision.
  • Figure 4: Comparison of (a) original and (b) stego-image. Visual differences are imperceptible, confirming high-fidelity embedding.
  • Figure 5: Original and recovered audio waveforms showing lossless reconstruction.

Theorems & Definitions (3)

  • Example 1
  • Example 2
  • Example 3