Table of Contents
Fetching ...

SoundSil-DS: Deep Denoising and Segmentation of Sound-field Images with Silhouettes

Risako Tanigawa, Kenji Ishikawa, Noboru Harada, Yasuhiro Oikawa

TL;DR

The proposed sound-field-images-with-object-silhouette denoising and segmentation (SoundSil-DS) that jointly perform denoising and segmentation for sound fields and object silhouettes on a visualized image may improve the post-processing for sound fields, such as physical model-based three-dimensional reconstruction.

Abstract

Development of optical technology has enabled imaging of two-dimensional (2D) sound fields. This acousto-optic sensing enables understanding of the interaction between sound and objects such as reflection and diffraction. Moreover, it is expected to be used an advanced measurement technology for sonars in self-driving vehicles and assistive robots. However, the low sound-pressure sensitivity of the acousto-optic sensing results in high intensity of noise on images. Therefore, denoising is an essential task to visualize and analyze the sound fields. In addition to denoising, segmentation of sound and object silhouette is also required to analyze interactions between them. In this paper, we propose sound-field-images-with-object-silhouette denoising and segmentation (SoundSil-DS) that jointly perform denoising and segmentation for sound fields and object silhouettes on a visualized image. We developed a new model based on the current state-of-the-art denoising network. We also created a dataset to train and evaluate the proposed method through acoustic simulation. The proposed method was evaluated using both simulated and measured data. We confirmed that our method can applied to experimentally measured data. These results suggest that the proposed method may improve the post-processing for sound fields, such as physical model-based three-dimensional reconstruction since it can remove unwanted noise and separate sound fields and other object silhouettes. Our code is available at https://github.com/nttcslab/soundsil-ds.

SoundSil-DS: Deep Denoising and Segmentation of Sound-field Images with Silhouettes

TL;DR

The proposed sound-field-images-with-object-silhouette denoising and segmentation (SoundSil-DS) that jointly perform denoising and segmentation for sound fields and object silhouettes on a visualized image may improve the post-processing for sound fields, such as physical model-based three-dimensional reconstruction.

Abstract

Development of optical technology has enabled imaging of two-dimensional (2D) sound fields. This acousto-optic sensing enables understanding of the interaction between sound and objects such as reflection and diffraction. Moreover, it is expected to be used an advanced measurement technology for sonars in self-driving vehicles and assistive robots. However, the low sound-pressure sensitivity of the acousto-optic sensing results in high intensity of noise on images. Therefore, denoising is an essential task to visualize and analyze the sound fields. In addition to denoising, segmentation of sound and object silhouette is also required to analyze interactions between them. In this paper, we propose sound-field-images-with-object-silhouette denoising and segmentation (SoundSil-DS) that jointly perform denoising and segmentation for sound fields and object silhouettes on a visualized image. We developed a new model based on the current state-of-the-art denoising network. We also created a dataset to train and evaluate the proposed method through acoustic simulation. The proposed method was evaluated using both simulated and measured data. We confirmed that our method can applied to experimentally measured data. These results suggest that the proposed method may improve the post-processing for sound fields, such as physical model-based three-dimensional reconstruction since it can remove unwanted noise and separate sound fields and other object silhouettes. Our code is available at https://github.com/nttcslab/soundsil-ds.

Paper Structure

This paper contains 34 sections, 2 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Conceptual diagram of proposed method. (a) Experimental setup for optical sound measurement, which is microphone-free sound measurement device. (b) Conceptual diagram. Sound field with interacting objects is captured as images with high-speed camera. Visualized images are converted to denoised and segmentation images with a DNN.
  • Figure 2: Overview of our approach. (a) Training process. Two channels of noisy sound images are input into network. Output images have three channels: first two channels are for denoising and last one channel is for segmentation. Loss of denoised and segmented images is calculated separately for each ground truth image. (b) Inference process. Experimentally measured time sequential images are converted to frequency domain by Fourier transform (FT). Each frequency complex amplitude is turned into real and imaginary images and input into trained model. Denoised images of all frequency bins are converted to time-sequential images by inverse FT. Segmentation image at sound frequency is extracted as final segmentation label.
  • Figure 3: Dataset creation. (a) Simulation setup. Sound sources are installed outside observation area. Objects are installed inside observation area. (b) Simulated data. Top row shows object silhouettes, second row shows clean simulated images, and bottom row shows noisy images with noise added to clean images. Color indicates real part of complex amplitude, ranging from $-1.0$ to $1.0$.
  • Figure 4: Qualitative results. Top row shows input images. From second to sixth rows, denoised and segmented images are shown. Last row shows GT images. Left ten columns are for denoising and right ten columns are for segmentation. Input images on right ten columns are same as those in left ten columns. Color indicates real part of complex amplitude.
  • Figure 5: Analyses of results. (a) PSNRs of denoised images relative to SNRs of sound field in input images. PSNRs improved as input SNR increased except for DnCNN. (b) IoUs of segmentation images relative to percentage of object silhouettes' area. IoUs tended to decrease where areas were small.
  • ...and 7 more figures