Table of Contents
Fetching ...

Implicit Steganography Beyond the Constraints of Modality

Sojeong Song, Seoyun Yang, Chang D. Yoo, Junmo Kim

TL;DR

The paper addresses cross-modal steganography by proposing INRSteg, which encodes secret data as Implicit Neural Representations (INRs) and distributes them into a stego INR using layer-wise permutation guided by a private key, enabling multi-secret embedding across image, audio, video, and 3D shapes without training new models. It demonstrates state-of-the-art performance in both cross-modal and intra-modal tasks, achieves high capacity with a compact model (~0.4 million parameters), and shows robustness under quantization while remaining hard to detect by steganalysis tools. The approach unifies multimodal data into a single representation, avoids domain adaptation issues, and offers practical efficiency benefits, paving the way for secure and scalable cross-modal steganography. The key innovations include secret-INR allocation, diagonal-block weight updates, and the cryptographic layer-wise permutation with a 128-bit private key, which together provide strong security and flexibility across diverse data types.

Abstract

Cross-modal steganography is committed to hiding secret information of one modality in another modality. Despite the advancement in the field of steganography by the introduction of deep learning, cross-modal steganography still remains to be a challenge to the field. The incompatibility between different modalities not only complicate the hiding process but also results in increased vulnerability to detection. To rectify these limitations, we present INRSteg, an innovative cross-modal steganography framework based on Implicit Neural Representations (INRs). We introduce a novel network allocating framework with a masked parameter update which facilitates hiding multiple data and enables cross modality across image, audio, video and 3D shape. Moreover, we eliminate the necessity of training a deep neural network and therefore substantially reduce the memory and computational cost and avoid domain adaptation issues. To the best of our knowledge, in the field of steganography, this is the first to introduce diverse modalities to both the secret and cover data. Detailed experiments in extreme modality settings demonstrate the flexibility, security, and robustness of INRSteg.

Implicit Steganography Beyond the Constraints of Modality

TL;DR

The paper addresses cross-modal steganography by proposing INRSteg, which encodes secret data as Implicit Neural Representations (INRs) and distributes them into a stego INR using layer-wise permutation guided by a private key, enabling multi-secret embedding across image, audio, video, and 3D shapes without training new models. It demonstrates state-of-the-art performance in both cross-modal and intra-modal tasks, achieves high capacity with a compact model (~0.4 million parameters), and shows robustness under quantization while remaining hard to detect by steganalysis tools. The approach unifies multimodal data into a single representation, avoids domain adaptation issues, and offers practical efficiency benefits, paving the way for secure and scalable cross-modal steganography. The key innovations include secret-INR allocation, diagonal-block weight updates, and the cryptographic layer-wise permutation with a 128-bit private key, which together provide strong security and flexibility across diverse data types.

Abstract

Cross-modal steganography is committed to hiding secret information of one modality in another modality. Despite the advancement in the field of steganography by the introduction of deep learning, cross-modal steganography still remains to be a challenge to the field. The incompatibility between different modalities not only complicate the hiding process but also results in increased vulnerability to detection. To rectify these limitations, we present INRSteg, an innovative cross-modal steganography framework based on Implicit Neural Representations (INRs). We introduce a novel network allocating framework with a masked parameter update which facilitates hiding multiple data and enables cross modality across image, audio, video and 3D shape. Moreover, we eliminate the necessity of training a deep neural network and therefore substantially reduce the memory and computational cost and avoid domain adaptation issues. To the best of our knowledge, in the field of steganography, this is the first to introduce diverse modalities to both the secret and cover data. Detailed experiments in extreme modality settings demonstrate the flexibility, security, and robustness of INRSteg.
Paper Structure (23 sections, 4 equations, 4 figures, 7 tables)

This paper contains 23 sections, 4 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: A general framework of INRSteg for hiding two types of secret data. All four data types can be transformed into INRs, and the red box shows which of the four data types is selected in this case. After the representation transformation of each secret data, the weights of the two INRs are allocated in a new MLP network. With the weight freezing technique, the weights from the secret data are fixed while the rest are fitted on the cover data. The existence of the secret data is concealed by permutation encoding via private key, which does not affect the reconstruction performance. The private key is then used to decode the permuted stego network, and the secret data are revealed by separating each MLP network.
  • Figure 2: An example of allocating two secret INRs, ${\theta_{secret1}}$ and ${\theta_{secret2}}$, into the stego INR, ${\theta_{stego}}$. For ${\theta_{secret1}}$, ${\theta_{secret2}}$, and ${\theta_{stego}}$, ${(I_1=1,}$${O_1=1,}$${n_1=4,}$${D_1=4)}$, ${(I_2=3,}$${O_2=3,}$${n_2=3,}$${D_2=5)}$, and ${(I=2,}$${O=3,}$${n=4,}$${D = 9)}$, respectively.
  • Figure 3: Visualization results of cross-modal steganography. Can you identify which image is the cover and the reconstructed cover data? For the video and audio examples, can you determine which are the secret and revealed secret data? For images, upper one is cover data and below is reconstructed cover data. For video, upper row is secret data and below is revealed secret data. For audio, the left is secret data and the right is revealed secret data.
  • Figure 4: A comparison of difference maps (enhanced by 10x, 20x, and 30x) between the original images and the revealed images. Columns 2-6 are results of INRSteg and columns 7-10 are results of DeepMIH 9676416.