Table of Contents
Fetching ...

SilentCipher: Deep Audio Watermarking

Mayank Kumar Singh, Naoya Takahashi, Weihsiang Liao, Yuki Mitsufuji

TL;DR

This work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks and introduces psuedo-differentiable compression layers, enhancing the robustness of the watermarking algorithm.

Abstract

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study, we introduce three key innovations. Firstly, our work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks. Secondly, we introduce psuedo-differentiable compression layers, enhancing the robustness of our watermarking algorithm. Lastly, we introduce a method to eliminate the need for perceptual losses, enabling us to achieve SOTA in both robustness as well as imperceptible watermarking. Our contributions lead us to SilentCipher, a model enabling users to encode messages within audio signals sampled at 44.1kHz.

SilentCipher: Deep Audio Watermarking

TL;DR

This work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks and introduces psuedo-differentiable compression layers, enhancing the robustness of the watermarking algorithm.

Abstract

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study, we introduce three key innovations. Firstly, our work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks. Secondly, we introduce psuedo-differentiable compression layers, enhancing the robustness of our watermarking algorithm. Lastly, we introduce a method to eliminate the need for perceptual losses, enabling us to achieve SOTA in both robustness as well as imperceptible watermarking. Our contributions lead us to SilentCipher, a model enabling users to encode messages within audio signals sampled at 44.1kHz.
Paper Structure (10 sections, 3 equations, 2 figures, 3 tables)

This paper contains 10 sections, 3 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The model architecture consists of the message transformation network, $L$, encoder network, $E$, the carrier decoder network $D_{c}$, the scaler operation, $S_{c}$, the ReLU operation, $R$, and the message decoder $D_{m}$.
  • Figure 2: Visual analysis of the watermarks